Skip to contents

Match collection to an enrichment database via taxonomic names

Usage

match_collection_to_enrich_database(
  collection,
  enrich_database,
  taxon_name_column = NA,
  taxon_name_full_column = NA,
  taxon_author_column = NA,
  enrich_taxon_name_column = NA,
  enrich_taxon_authors_column = NA,
  typo_method = "All",
  do_add_split = TRUE,
  do_fix_hybrid = TRUE,
  do_rm_autonym = TRUE,
  do_rm_cultivar_indeterminates = TRUE,
  do_match_single = TRUE,
  do_match_multiple = TRUE,
  do_fix_taxon_name = TRUE,
  matching_criterion = BGSmartR::no_additional_matching,
  ...
)

Arguments

collection

A data frame containing a collection.

enrich_database

A data frame of enriching information.

taxon_name_column

The name of the column in the collection corresponding to taxonomic names.

taxon_name_full_column

The name of the column in the collection corresponding to joined taxonomic names and authors.

taxon_author_column

The name of the column in the collection corresponding to the authors of the taxonomic names.

enrich_taxon_name_column

The name of the column in enrich_database that corresponds to the taxonomic names.

enrich_taxon_authors_column

The name of the column in enrich_database that corresponds to the authors of taxonomic names.

typo_method

Either 'All', 'Data frame only','Data frame + common', no; detailing the level of typo finding required.

do_add_split

Flag (TRUE/FALSE) for whether we search for missing f./var./subsp.

do_fix_hybrid

Flag (TRUE/FALSE) for whether we search for hybrid issues.

do_rm_autonym

Flag (TRUE/FALSE) for whether we try removing autonyms.

do_rm_cultivar_indeterminates

Flag (TRUE/FALSE) for whether we remove cultivars and indeterminates prior to taxonomic name matching.

do_match_single

Flag (TRUE/FALSE) for whether we do matching to unique taxonomic names in enrich_database.

do_match_multiple

Flag (TRUE/FALSE) for whether we do matching to non-unique taxonomic names in enrich_database.

do_fix_taxon_name

Flag (TRUE/FALSE) for whether attempt to fix common issues in taxonomic names to aid matching. Sections of common issue fixes can also be turned on/off using the inputs do_add_split, do_fix_hybrid, do_rm_autonym.

matching_criterion

A function used to chose the best method from extracts of the enrich_database.

...

Arguments (i.e., attributes) used in the matching algorithm (passed along to nested fuctions). Examples include, enrich_display_in_message_column and enrich_plant_identifier_column.

Value

A list of length seven containing:

  • $match the index of the record in enrich_database which matches the record in the collection database.

  • $details_short a simplified message detailing the match.

  • $match_taxon_name a longer format message detailing the match.

  • $original_authors The author/s (extracted) from the collection database.

  • $match_authors The author/s of the matched record in enrich_database.

  • $author_check Either Identical, Partial or Different (No Match if a match to enrich_database cannot be found). A message informing the similarity of the collection's taxon authors and the authors found in enrich_database. Author similarity is found using the function author_check().

Details

This function allows matching of a collection's database to an enrichment database.

By default the function uses all the steps of our matching algorithm, for details of this see the vignette Matching.Rmd (Method of Matching taxonomic records). If parts of the algorithm are not required these can be switched off using typo_method, do_add_split, do_fix_hybrid, do_rm_autonym, do_rm_cultivar_indeterminates, do_match_single, do_match_multiple and do_fix_taxon_name. Moreover, by default no custom matching if performed. A user inputted custom matching criterion (function) can be added via the input matching_criterion.

To perform the matching you must specify the columns name of the taxon name in the enrichment database (enrich_taxon_name_column). If author matching is required then this column must also be specified for the enrichment database (enrich_taxon_authors_column).

The enrichment database must have some columns required for matching (single_entry, taxon_length, etc), we advice using prepare_enrich_database() to add these columns.

Similarly, you must specify the columns name of the taxon name in the collection database (taxon_name_column). If author matching is desired then you have two choices:

  • specify the taxon author column taxon_author_column.

  • Specify the combined taxon name and author column, taxon_name_full_column which removes words found in the taxon names from taxon names full to extract the authors.

Note if both are specified then the authors from taxon_author_column are used.