
Prepare enrichment database for matching
prepare_enrich_database.Rd
Prepare enrichment database for matching
Usage
prepare_enrich_database(
enrich_database,
enrich_taxon_name_column = "taxon_names",
enrich_taxon_authors_column = NA,
enrich_taxon_name_full_column = NA,
do_sanitise = TRUE,
do_taxon_length = TRUE,
do_single_entry = TRUE,
do_author_parts = TRUE,
do_add_id = TRUE,
do_sort = TRUE,
console_message = FALSE
)
Arguments
- enrich_database
A data frame of enriching information.
- enrich_taxon_name_column
The name of the column in
enrich_database
that corresponds to taxonomic names. Default value istaxon_names
.The name of the column in
enrich_database
that corresponds to the authors of taxonomic names.- enrich_taxon_name_full_column
The name of the column in the
enrich_database
corresponding to joined taxonomic names and authors.- do_sanitise
Flag (TRUE/FALSE) detailing whether to add the columns
sanitise_name
,sanitise_author
andrequire_sanitise
corresponding to the sanitised taxonomic name, sanitised author and a flag (TRUE/FALSE) of whether the taxonomic name needed sanitising.- do_taxon_length
Flag (TRUE/FALSE) detailing whether to add the column
taxon_length
containing the string length of the taxonomic names.- do_single_entry
Flag (TRUE/FALSE) detailing whether to add the column
single_entry
containing whether the taxonomic name appears multiple times inenrich_database
.Flag (TRUE/FALSE) detailing whether to add the column
author_parts
containing the words extracted from the taxonomic author. This is used for partial author matching.- do_add_id
Flag (TRUE/FALSE) detailing whether to add the column
ID
containing a unique identifer for each record in theenrich_database
.- do_sort
Flag (TRUE/FALSE) detailing whether to alphabetically sort the taxonomic names in
enrich database
.- console_message
Flag (TRUE/FALSE) detailing whether to show messages in the console.
Details
This function adds columns to the enrichment database used when matching to taxonomic names. By default this includes:
sanitise_name
The sanitised taxonomic name.sanitise_author
The sanitised author of the taxonomic name.require_sanitise
A logical column (TRUE/FALSE) for whether the taxonomic name required sanitising. =author_parts
Words found in the taxonomic author (removing initials and punctuation) used when performing partial author matching.taxon_length
The length of the taxonomic name (string length) used if typo searching if performed in the matching algorithm.single_entry
A logical column (TRUE/FALSE) for whether the taxonomic name is unique inenrich_database
, used to restrict records when performing either single matching or multiple matching.ID
A column of unique identifiers for each record inenrich_database
. Used for referencing the matches.
Moreover the function sorts the enrich database in alphabetical order of the taxonomic names.
These additional column can be switched on/off using the inputs do_sanitise
, do_taxon_length
, do_single_entry
, do_author_parts
and do_sort
.
Note that if sanitising if performed then sorting and single_entry will be performed with the sanitised taxonoic names and not the inputted names (enrich_database[[enrich_taxon_name_column]]
). Similarly, sanitised authors will be used when creating author parts.
Examples
taxon_names = c('Abies taxifolia', 'ABIES taxifolia',
'Acalypha gracilens', 'Eupatorium magdalenae', 'Adina racemosa')
taxon_authors = c('Duhamel', 'Poir.', 'A.Gray', 'Stehlé', '(Siebold & Zucc.) Miq.')
enrich_database = data.frame(taxon_names,taxon_authors, value = runif(5))
prepare_enrich_database(enrich_database,
enrich_taxon_name_column = 'taxon_names',
enrich_taxon_authors_column = 'taxon_authors')
#> ID taxon_names taxon_authors value
#> 1 1 Abies taxifolia Duhamel 0.73531960
#> 2 2 ABIES taxifolia Poir. 0.19595673
#> 3 3 Acalypha gracilens A.Gray 0.98053967
#> 5 4 Adina racemosa (Siebold & Zucc.) Miq. 0.05144628
#> 4 5 Eupatorium magdalenae Stehlé 0.74152153
#> sanitise_name sanitise_author require_sanitise taxon_length
#> 1 Abies taxifolia Duhamel FALSE 15
#> 2 Abies taxifolia Poir. TRUE 15
#> 3 Acalypha gracilens A.Gray FALSE 18
#> 5 Adina racemosa (Siebold & Zucc.) Miq. FALSE 14
#> 4 Eupatorium magdalenae Stehle FALSE 21
#> single_entry author_parts
#> 1 FALSE Duhamel
#> 2 FALSE Poir
#> 3 TRUE Gray
#> 5 TRUE Siebold, Zucc, Miq
#> 4 TRUE Stehle