Skip to contents

Prepare enrichment database for matching

Usage

prepare_enrich_database(
  enrich_database,
  enrich_taxon_name_column = "taxon_names",
  enrich_taxon_authors_column = NA,
  enrich_taxon_name_full_column = NA,
  do_sanitise = TRUE,
  do_taxon_length = TRUE,
  do_single_entry = TRUE,
  do_author_parts = TRUE,
  do_add_id = TRUE,
  do_sort = TRUE,
  console_message = FALSE
)

Arguments

enrich_database

A data frame of enriching information.

enrich_taxon_name_column

The name of the column in enrich_database that corresponds to taxonomic names. Default value is taxon_names.

enrich_taxon_authors_column

The name of the column in enrich_database that corresponds to the authors of taxonomic names.

enrich_taxon_name_full_column

The name of the column in the enrich_database corresponding to joined taxonomic names and authors.

do_sanitise

Flag (TRUE/FALSE) detailing whether to add the columns sanitise_name, sanitise_author and require_sanitise corresponding to the sanitised taxonomic name, sanitised author and a flag (TRUE/FALSE) of whether the taxonomic name needed sanitising.

do_taxon_length

Flag (TRUE/FALSE) detailing whether to add the column taxon_length containing the string length of the taxonomic names.

do_single_entry

Flag (TRUE/FALSE) detailing whether to add the column single_entry containing whether the taxonomic name appears multiple times in enrich_database.

do_author_parts

Flag (TRUE/FALSE) detailing whether to add the column author_parts containing the words extracted from the taxonomic author. This is used for partial author matching.

do_add_id

Flag (TRUE/FALSE) detailing whether to add the column ID containing a unique identifer for each record in the enrich_database.

do_sort

Flag (TRUE/FALSE) detailing whether to alphabetically sort the taxonomic names in enrich database.

console_message

Flag (TRUE/FALSE) detailing whether to show messages in the console.

Value

enrich_database with additional columns used by matching to taxonomic names.

Details

This function adds columns to the enrichment database used when matching to taxonomic names. By default this includes:

  • sanitise_name The sanitised taxonomic name.

  • sanitise_author The sanitised author of the taxonomic name.

  • require_sanitise A logical column (TRUE/FALSE) for whether the taxonomic name required sanitising. = author_parts Words found in the taxonomic author (removing initials and punctuation) used when performing partial author matching.

  • taxon_length The length of the taxonomic name (string length) used if typo searching if performed in the matching algorithm.

  • single_entry A logical column (TRUE/FALSE) for whether the taxonomic name is unique in enrich_database, used to restrict records when performing either single matching or multiple matching.

  • ID A column of unique identifiers for each record in enrich_database. Used for referencing the matches.

Moreover the function sorts the enrich database in alphabetical order of the taxonomic names.

These additional column can be switched on/off using the inputs do_sanitise, do_taxon_length, do_single_entry, do_author_parts and do_sort.

Note that if sanitising if performed then sorting and single_entry will be performed with the sanitised taxonoic names and not the inputted names (enrich_database[[enrich_taxon_name_column]]). Similarly, sanitised authors will be used when creating author parts.

Examples

taxon_names = c('Abies taxifolia', 'ABIES taxifolia',
 'Acalypha gracilens', 'Eupatorium magdalenae', 'Adina racemosa')
taxon_authors = c('Duhamel', 'Poir.', 'A.Gray', 'Stehlé', '(Siebold & Zucc.) Miq.')

enrich_database = data.frame(taxon_names,taxon_authors, value = runif(5))
prepare_enrich_database(enrich_database,
enrich_taxon_name_column = 'taxon_names',
enrich_taxon_authors_column = 'taxon_authors')
#>   ID           taxon_names          taxon_authors      value
#> 1  1       Abies taxifolia                Duhamel 0.73531960
#> 2  2       ABIES taxifolia                  Poir. 0.19595673
#> 3  3    Acalypha gracilens                 A.Gray 0.98053967
#> 5  4        Adina racemosa (Siebold & Zucc.) Miq. 0.05144628
#> 4  5 Eupatorium magdalenae                 Stehlé 0.74152153
#>           sanitise_name        sanitise_author require_sanitise taxon_length
#> 1       Abies taxifolia                Duhamel            FALSE           15
#> 2       Abies taxifolia                  Poir.             TRUE           15
#> 3    Acalypha gracilens                 A.Gray            FALSE           18
#> 5        Adina racemosa (Siebold & Zucc.) Miq.            FALSE           14
#> 4 Eupatorium magdalenae                 Stehle            FALSE           21
#>   single_entry       author_parts
#> 1        FALSE            Duhamel
#> 2        FALSE               Poir
#> 3         TRUE               Gray
#> 5         TRUE Siebold, Zucc, Miq
#> 4         TRUE             Stehle