
Prepare enrichment database for matching
prepare_enrich_database.RdPrepare enrichment database for matching
Usage
prepare_enrich_database(
enrich_database,
enrich_taxon_name_column = "taxon_names",
enrich_taxon_authors_column = NA,
enrich_taxon_name_full_column = NA,
do_sanitise = TRUE,
do_taxon_length = TRUE,
do_single_entry = TRUE,
do_author_parts = TRUE,
do_add_id = TRUE,
do_sort = TRUE,
console_message = FALSE
)Arguments
- enrich_database
A data frame of enriching information.
- enrich_taxon_name_column
The name of the column in
enrich_databasethat corresponds to taxonomic names. Default value istaxon_names.The name of the column in
enrich_databasethat corresponds to the authors of taxonomic names.- enrich_taxon_name_full_column
The name of the column in the
enrich_databasecorresponding to joined taxonomic names and authors.- do_sanitise
Flag (TRUE/FALSE) detailing whether to add the columns
sanitise_name,sanitise_authorandrequire_sanitisecorresponding to the sanitised taxonomic name, sanitised author and a flag (TRUE/FALSE) of whether the taxonomic name needed sanitising.- do_taxon_length
Flag (TRUE/FALSE) detailing whether to add the column
taxon_lengthcontaining the string length of the taxonomic names.- do_single_entry
Flag (TRUE/FALSE) detailing whether to add the column
single_entrycontaining whether the taxonomic name appears multiple times inenrich_database.Flag (TRUE/FALSE) detailing whether to add the column
author_partscontaining the words extracted from the taxonomic author. This is used for partial author matching.- do_add_id
Flag (TRUE/FALSE) detailing whether to add the column
IDcontaining a unique identifer for each record in theenrich_database.- do_sort
Flag (TRUE/FALSE) detailing whether to alphabetically sort the taxonomic names in
enrich database.- console_message
Flag (TRUE/FALSE) detailing whether to show messages in the console.
Details
This function adds columns to the enrichment database used when matching to taxonomic names. By default this includes:
sanitise_nameThe sanitised taxonomic name.sanitise_authorThe sanitised author of the taxonomic name.require_sanitiseA logical column (TRUE/FALSE) for whether the taxonomic name required sanitising. =author_partsWords found in the taxonomic author (removing initials and punctuation) used when performing partial author matching.taxon_lengthThe length of the taxonomic name (string length) used if typo searching if performed in the matching algorithm.single_entryA logical column (TRUE/FALSE) for whether the taxonomic name is unique inenrich_database, used to restrict records when performing either single matching or multiple matching.IDA column of unique identifiers for each record inenrich_database. Used for referencing the matches.
Moreover the function sorts the enrich database in alphabetical order of the taxonomic names.
These additional column can be switched on/off using the inputs do_sanitise, do_taxon_length, do_single_entry, do_author_parts and do_sort.
Note that if sanitising if performed then sorting and single_entry will be performed with the sanitised taxonoic names and not the inputted names (enrich_database[[enrich_taxon_name_column]]). Similarly, sanitised authors will be used when creating author parts.
Examples
taxon_names = c('Abies taxifolia', 'ABIES taxifolia',
'Acalypha gracilens', 'Eupatorium magdalenae', 'Adina racemosa')
taxon_authors = c('Duhamel', 'Poir.', 'A.Gray', 'Stehlé', '(Siebold & Zucc.) Miq.')
enrich_database = data.frame(taxon_names,taxon_authors, value = runif(5))
prepare_enrich_database(enrich_database,
enrich_taxon_name_column = 'taxon_names',
enrich_taxon_authors_column = 'taxon_authors')
#> ID taxon_names taxon_authors value
#> 1 1 Abies taxifolia Duhamel 0.73531960
#> 2 2 ABIES taxifolia Poir. 0.19595673
#> 3 3 Acalypha gracilens A.Gray 0.98053967
#> 5 4 Adina racemosa (Siebold & Zucc.) Miq. 0.05144628
#> 4 5 Eupatorium magdalenae Stehlé 0.74152153
#> sanitise_name sanitise_author require_sanitise taxon_length
#> 1 Abies taxifolia Duhamel FALSE 15
#> 2 Abies taxifolia Poir. TRUE 15
#> 3 Acalypha gracilens A.Gray FALSE 18
#> 5 Adina racemosa (Siebold & Zucc.) Miq. FALSE 14
#> 4 Eupatorium magdalenae Stehle FALSE 21
#> single_entry author_parts
#> 1 FALSE Duhamel
#> 2 FALSE Poir
#> 3 TRUE Gray
#> 5 TRUE Siebold, Zucc, Miq
#> 4 TRUE Stehle