
Matching functions
match_single.RdFunctions used to match taxonomic names from a collection to exterior databases (POWO's WCVP, IUCN Redlist)
Usage
match_single(
taxon_names,
enrich_database,
enrich_database_search_index,
enrich_taxon_name_column = "taxon_name",
enrich_display_in_message_column = "ID",
match_column = NA,
...
)
match_multiple(
taxon_names,
taxon_authors,
enrich_database,
enrich_database_search_index,
enrich_taxon_name_column = "taxon_name",
enrich_display_in_message_column = "ID",
enrich_plant_identifier_column = "ID",
match_column = NA,
...,
show_progress = TRUE
)
match_all_issue(
taxon_names,
taxon_authors = rep(NA, length(taxon_names)),
enrich_database,
matching_authors = BGSmartR::match_authors,
matching_criterion = BGSmartR::additional_wcvp_matching,
do_add_split = TRUE,
do_fix_hybrid = TRUE,
do_rm_autonym = TRUE,
enrich_taxon_name_column = "taxon_name",
enrich_taxon_authors_column = "taxon_authors_simp",
enrich_plant_identifier_column = "ID",
enrich_display_in_message_column = "ID",
...
)
match_typos(
taxon_names,
taxon_authors,
enrich_database,
enrich_taxon_name_column = "taxon_name",
single_indices = NA,
mult_indices = NA,
typo_method = "Data frame only",
do_match_multiple = TRUE,
...
)
no_match_cultivar_indet(taxon_names)
get_match_from_multiple(
taxon_name_and_author,
enrich_database_mult,
matching_authors = BGSmartR::match_authors,
matching_criterion = BGSmartR::no_additional_matching,
enrich_plant_identifier_column = "plant_name_id",
enrich_taxon_name_column = "taxon_name",
enrich_taxon_authors_column = "taxon_authors_simp",
enrich_taxon_author_words_column = "author_parts",
...
)
check_taxon_typo(
taxon_name,
enrich_database = NA,
enrich_taxon_name_column = "taxon_name",
typo_df = BGSmartR::typo_list,
typo_method = "Data frame only",
...
)
shorten_message(messages)
try_rm_autonym(
taxon_names,
enrich_database_taxon_names,
console_message = TRUE,
...
)
try_fix_infraspecific_level(
taxon_names,
enrich_database_taxon_names,
try_hybrid = TRUE,
console_message = TRUE,
...
)
try_fix_hybrid(
taxon_names,
enrich_database_taxon_names,
try_hybrid = TRUE,
console_message = TRUE,
...
)Arguments
- taxon_names
Vector of taxonomic names.
- enrich_database
A data frame of enriching information we want to match
taxon_namesto.- enrich_database_search_index
A vector of indices of
enrich_databasethat are desired to be matched to.- enrich_taxon_name_column
The name of the column in
enrich_databasethat corresponds to taxonomic names. Default value istaxon_names.- enrich_display_in_message_column
The name of the column in
enrich_databasethat contains values to show in the matching messages. Default value ispowo_id(wcvp identifier).- match_column
either
NAor the name of the column inenrich_database. The default value ifNAwhich means the values of the match are the indices of the matched records in the enrich database. If instead a single column ofenrich_databaseis desired to be the result of the match the name of the column needs to be provided.- ...
Arguments (i.e., attributes) used in the matching algorithm (passed along to nested fuctions). Examples include
enrich_taxon_authors_column,enrich_display_in_message_columnandenrich_plant_identifier_column.A vector of full taxon names (corresponding to
taxon_names)- enrich_plant_identifier_column
The name of the column in
enrich_databasethat corresponds to record identifier. Default value isplant_name_id.- show_progress
Flag (TRUE/FALSE) for whether we show progress bar.
The function used to find the best match using the author of taxonomic names. By default the function
BGSmartR::match_authors()is used.- matching_criterion
The function used to find the best match when we have 'non-unique' taxonomic names. By default the function
BGSmartR::get_match_from_multiple()is used.- do_add_split
Flag (TRUE/FALSE) for whether we search for missing f./var./subsp.
- do_fix_hybrid
Flag (TRUE/FALSE) for whether we search for hybrid issues.
- do_rm_autonym
Flag (TRUE/FALSE) for whether we try removing autonyms.
The name of the column in
enrich_databasethat corresponds to the authors of taxonomic names. Default value istaxon_authors_simp.- single_indices
A vector of indices of
enrich_databasethat correspond to the records that have 'unique' taxonomic names.- mult_indices
A vector of indices of
enrich_databasethat correspond to the records that have 'non-unique' taxonomic names.- typo_method
Either
'All','Data frame only','Data frame + common', detailing the level of typo finding required.- do_match_multiple
Flag (TRUE/FALSE) for whether we attempt matching those found to have multiple taxonomic names in the enrich database..
the pair of taxonomic name and combined taxonomic name and author
- enrich_database_mult
enrich_databaserestricted to the rows that correspond to 'non-unique' taxonomic names.The name of the column in
enrich_databasethat corresponds to the words contained in the authors of taxonomic names. Default value isauthor_parts.- taxon_name
A single taxonomic name.
- typo_df
A data frame where the first column is a taxonomic name with a typo and the second column is the corrected taxonomic name. By default
BGSmartR::typo_listis used.- messages
messages detailing how a match is obtained.
- enrich_database_taxon_names
The taxon names taken from
enrich_database.- console_message
Flag (TRUE/FALSE) detailing whether to show messages in the console.
- try_hybrid
Flag (TRUE/FALSE) for whether hybrid fixes are attempted.
Details
Below we outline the uses of each function. For further details and examples on matching functions please see the Matching.Rmd vignette.
Each of the matching functions generally return the index of the matching record in enrich_database and a message detailing how the match was obtained. These function can be used as building blocks to build a custom taxonomic name matching algorithm.
match_single()matchestaxon_namestoenrich_databasetaking only the first match.enrich_database_search_indexshould be used to restrict the enrich database to only 'unique' taxonomic names (i.e taxonomic names that correspond to a single record in the enrich database). For 'non-unique' taxonomic namesmatch_multiple()should be used.match_multiple()matchestaxon_namestoenrich_databasefor entries in enrich database that havenon-uniquetaxonomic names. For 'unique' taxonomic namesmatch_single()should be used. Fornon-uniquetaxonomic names we first use taxonomic author matching to decide which record to use. This matching is performed to each taxonomic name and author using the functionget_match_from_multiple().get_match_from_multiple()further depends on a matching criteria function which can be added using the inputmatching_criterion(passed via...). By default this is set toadditional_wcvp_matching(), which uses accepted_plant_name_id and taxon_status to chose the best match (in WCVP).match_all_issue()attempts to fix hybridisation, change infraspecific levels or remove autonyms to find matches to an enriched database. This function depends on the functions:try_rm_autonym()attempts to find taxonomic names inenrich_databaseby removing autonyms.try_fix_infraspecific_level()attempts to find taxonomic names inenrich_databaseby adding/changing/removing infraspecific levels (var., f., etc).try_fix_hybrid()attempts to find taxonomic names inenrich_databaseby adding/changing/removing hybrid markers (+ or x).
match_typos()attempts to find matches by searching for typos in the taxonomic name. This depends on the function:check_taxon_typo()to check a single taxonomic name for typos found either in a typo list or the enriched database.
no_match_cultivar_indet()searches for cultivars and indeterminates and sets their match to-1indicating no match.shorten_message()compresses matching message (details of how a match is found) into an easy to read format.