
Match collection to an enrichment database via taxonomic names
match_collection_to_enrich_database.Rd
Match collection to an enrichment database via taxonomic names
Usage
match_collection_to_enrich_database(
collection,
enrich_database,
taxon_name_column = NA,
taxon_name_full_column = NA,
taxon_author_column = NA,
enrich_taxon_name_column = NA,
enrich_taxon_authors_column = NA,
typo_method = "All",
do_add_split = TRUE,
do_fix_hybrid = TRUE,
do_rm_autonym = TRUE,
do_rm_cultivar_indeterminates = TRUE,
do_match_single = TRUE,
do_match_multiple = TRUE,
do_fix_taxon_name = TRUE,
matching_criterion = BGSmartR::no_additional_matching,
...
)
Arguments
- collection
A data frame containing a collection.
- enrich_database
A data frame of enriching information.
- taxon_name_column
The name of the column in the
collection
corresponding to taxonomic names.- taxon_name_full_column
The name of the column in the
collection
corresponding to joined taxonomic names and authors.The name of the column in the
collection
corresponding to the authors of the taxonomic names.- enrich_taxon_name_column
The name of the column in
enrich_database
that corresponds to the taxonomic names.The name of the column in
enrich_database
that corresponds to the authors of taxonomic names.- typo_method
Either
'All'
,'Data frame only'
,'Data frame + common'
,no
; detailing the level of typo finding required.- do_add_split
Flag (TRUE/FALSE) for whether we search for missing f./var./subsp.
- do_fix_hybrid
Flag (TRUE/FALSE) for whether we search for hybrid issues.
- do_rm_autonym
Flag (TRUE/FALSE) for whether we try removing autonyms.
- do_rm_cultivar_indeterminates
Flag (TRUE/FALSE) for whether we remove cultivars and indeterminates prior to taxonomic name matching.
- do_match_single
Flag (TRUE/FALSE) for whether we do matching to unique taxonomic names in
enrich_database
.- do_match_multiple
Flag (TRUE/FALSE) for whether we do matching to non-unique taxonomic names in
enrich_database
.- do_fix_taxon_name
Flag (TRUE/FALSE) for whether attempt to fix common issues in taxonomic names to aid matching. Sections of common issue fixes can also be turned on/off using the inputs
do_add_split
,do_fix_hybrid
,do_rm_autonym
.- matching_criterion
A function used to chose the best method from extracts of the
enrich_database
.- ...
Arguments (i.e., attributes) used in the matching algorithm (passed along to nested fuctions). Examples include,
enrich_display_in_message_column
andenrich_plant_identifier_column
.
Value
A list of length seven containing:
$match
the index of the record inenrich_database
which matches the record in the collection database.$details_short
a simplified message detailing the match.$match_taxon_name
a longer format message detailing the match.$original_authors
The author/s (extracted) from thecollection
database.$match_authors
The author/s of the matched record inenrich_database
.$author_check
EitherIdentical
,Partial
orDifferent
(No Match
if a match toenrich_database
cannot be found). A message informing the similarity of the collection's taxon authors and the authors found inenrich_database
. Author similarity is found using the functionauthor_check()
.
Details
This function allows matching of a collection's database to an enrichment database.
By default the function uses all the steps of our matching algorithm, for details of this see the vignette Matching.Rmd
(Method of Matching taxonomic records). If parts of the algorithm are not required these can be switched off using typo_method
, do_add_split
, do_fix_hybrid
, do_rm_autonym
, do_rm_cultivar_indeterminates
, do_match_single
, do_match_multiple
and do_fix_taxon_name.
Moreover, by default no custom matching if performed. A user inputted custom matching criterion (function) can be added via the input matching_criterion
.
To perform the matching you must specify the columns name of the taxon name in the enrichment database (enrich_taxon_name_column
). If author matching is required then this column must also be specified for the enrichment database (enrich_taxon_authors_column
).
The enrichment database must have some columns required for matching (single_entry
, taxon_length
, etc), we advice using prepare_enrich_database()
to add these columns.
Similarly, you must specify the columns name of the taxon name in the collection database (taxon_name_column
). If author matching is desired then you have two choices:
specify the taxon author column
taxon_author_column
.Specify the combined taxon name and author column,
taxon_name_full_column
which removes words found in the taxon names from taxon names full to extract the authors.
Note if both are specified then the authors from taxon_author_column
are used.