Using BGsmartR • BGSmartR

library(BGSmartR)

Example

In this example we go from an example collection, enriching with botanic resources and create reports using the enriched information.

First let’s load the collection.

collection = BGSmartR::collection_example

collection |> DT::datatable(rownames = F)

We see that our collection dataset has 123 records and 8 columns including:

ItemAccNoFull: The Item and accession ID of the plant.
AccNoFull: The accession ID of the plant.
AccYear: The year the plant was accessioned.
ProvenanceCode: The provenance of the accession.
TaxonName: The taxonomic name of the plant (without authority)
TaxonNameFull: The taxonomic name of the plant with authority.
ItemStatusType: The status of the plant (either “Existing” or “NotExisting”)
ItemStatusDate: The date of the last status update.

These columns are fairly standard in living collection records. Further information may be contained (species, genus, family, etc) which could be used further along the pipeline when creating reports.

Enriching the collection

To enrich the collection dataset we need to match records in our collection to external botanic datasets such as POWO (WCVP), WFO, IUCN RedList, etc. Herein, to perform matching we require the taxonomic name. For our dataset this is contained within TaxonName. Ideally, for better matching, we would also have the taxonomic authority however this is optional as the authority is not always known.

Creating these enrichment datasets can be time-consuming as they can be very large. See XX for how to create POWO,IUCN RedList enrichment datasets.

Within BGsmartR the function enrich_collection() can be used to enrich from WCVP, IUCN RedList and BGCI PlantSearch having used BGsmartR methods to create the enrichment databases. For other databases we can use enrich_collection_from_enrich_database(). Below we give examples of both.

With this in mind, we load a cached WCVP extract relating to the example collection to perform enriching.

# Load simplified POWO dataset for example.
load('data/wcvp_getting_started.rda')

# Enrich the collection with POWO (WCVP) information
enrich_collection = BGSmartR::enrich_collection(collection,
                                                wcvp = wcvp)
#> 
#> ── Sanitise taxon name and extract author ──
#> 
#> ── Adding POWO information ──
#> 
#> ℹ `123` records found.
#> 
#> ── Extracting taxon names and authors from the original report ──
#> 
#> ── Taxon authors not provided. ──
#> 
#> ── Reducing to unique taxon names ──
#> 
#> ℹ `100` unique taxon names/ taxon name author combinations found.
#> ℹ Found 0 exceptions to known not in POWO.
#> 
#> ── Removing known not to be in POWO from 100 names ──
#> 
#> ✔ Found 19 known not to be in POWO
#> 
#> ── Matching 81 names to unique taxon names ──
#> 
#> ✔ Found 74 of 81 names
#> 
#> ── Matching 7 names to non-unique taxon names ──
#> 
#> ✔ Found 4 of 7 names
#> 
#> ── Testing and matching taxon name issues for 3 names ──
#> 
#> ℹ Trying removing autonyms from taxon names
#> ℹ Trying fixing hybrid for taxon names with 2/3/4 words 2 names
#> ℹ Trying changing/removing hybrid for taxon names 1 name
#> ✔ Found 0 of 3 names
#> 
#> ── Testing and matching typos for 3 names ──
#> 
#> ✔ Found 0 of 3 names
#> 
#> ── Converting to accepted name.. ──
#> 
#> ✔ Updated to accepted name for 5 of 100 names
#> 
#> ── Matching Complete ──
#>

We see console messaging informing of the matching process. In total an additional 30 columns have been added to the original dataset. Below we show the information for the first two plant records (transposed for easier viewing)

enrich_collection[1:2,] |> t() |> data.frame() |>   DT::datatable(rownames = T)

We see that we have information detailing the matching process, and information taken from the WCVP database.

If we further want to match the collection to BGCI GlobalTreeSeach we could do the following

# This link may become invalid over time. See https://tools.bgci.org/global_tree_search.php for newer link if needed.
BGCI_trees <- read.csv(url('https://tools.bgci.org/global_tree_search_trees_1_7.csv'))[,1:2]
names(BGCI_trees) = c('taxon_names', 'taxon_author')
BGCI_trees$is_tree = rep(T, nrow(BGCI_trees)) # Add column stating each record is a tree.

# Convert ot BGsmartR format.
BGCI_trees = BGSmartR::prepare_enrich_database(BGCI_trees,
                                  enrich_taxon_name_column = 'taxon_names',
                                  enrich_taxon_authors_column = 'taxon_author',
                                  console_message = TRUE)
#> ── Sanitising taxonomic names and author ───────────────────────────────────────
#> 
#> ── Add taxon name length column ────────────────────────────────────────────────
#> 
#> ── Add single entry column ─────────────────────────────────────────────────────
#> 
#> ── Add author words column ─────────────────────────────────────────────────────
#> 
#> ── Sort the records into alphabetical order ────────────────────────────────────

### Enrich the collection with trees
enrich_collection = BGSmartR::enrich_collection_from_enrich_database(
  enrich_collection,
  enrich_database = BGCI_trees,
  taxon_name_column = 'TaxonName',
  taxon_name_full_column = 'TaxonNameFull',
  enrich_taxon_name_column = 'sanitise_name',
  enrich_taxon_authors_column = 'sanitise_author',
  columns_to_enrich = 'is_tree'
  )
#> ℹ `123` records found.
#> 
#> ── Extracting taxon names and authors from the collection ──
#> 
#> ── Reducing to unique taxon name and author combinations ──
#> 
#> ℹ `100` unique taxon names/ taxon name author combinations found.
#> 
#> ── Removing cultivars and indeterminates from 100 names ──
#> 
#> ✔ Found 19 cultivars and indeterminates
#> 
#> ── Matching 81 names to unique taxon names ──
#> 
#> ✔ Found 9 of 81 names
#> 
#> ── Matching 72 names to non-unique taxon names ──
#> 
#> ✔ Found 0 of 72 names
#> ℹ Trying removing autonyms from taxon names
#> ℹ Trying fixing hybrid for taxon names with 2/3/4 words 17 names
#> ℹ Trying changing/removing hybrid for taxon names 3 names
#> 
#> ── Testing and matching taxon name issues for 72 names ──
#> 
#> ℹ Trying removing autonyms from taxon names
#> ℹ Trying fixing hybrid for taxon names with 2/3/4 words 71 names
#> ℹ Trying changing/removing hybrid for taxon names 3 names
#> ✔ Found 0 of 72 names
#> 
#> ── Testing and matching typos for 72 names ──
#> 
#> ✔ Found 0 of 72 names
#> 
#> ── Matching Complete ──
#>

We find that 10 records are trees.

enrich_collection$Enrich_is_tree |> table(useNA = 'always')
#> 
#> TRUE <NA> 
#>   10  113

Note that this method could be improved see XX.

In summary we can use enrich_collection() to enrich a collection with a combination of WCVP, IUCN RedList or BGCI PlantSearch information where these databases have been created via BGSmartR. To enrich a collection with any enrichment database we can make use of prepare_enrich_database() to prepare the database and enrich_collection_from_enrich_database() to perform the enriching.

Creating reports

Within BGsmartR we can create 12 different reports for:

Overview: The “best” information and graphics from the other reports.
Trends: Exploring how the collection has changed over time.
Turnover: A look at flow of incoming and outgoing plants into the collection.
Geography: Where in the world the plants of the collection come from.
Taxonomic Diversity: What type of plants are held in the collection.
Threatened: A deep dive into threatened plants in the collection.
Native: A deep dive into native (to the collections location) plants in the collection.
Endemic: A deep dive into endemic plants in the collection.
Tree: A deep dive into trees in the collection.
Duplication: Exploring how often multiple copys of the same plants exist in the collection.
Data Health: A look into the health of the data records for the collection.
Sustainability: How long plants “survive” in the collection.

To run these reports we require the collection’s information to be enriched with data from POWO, IUCN RedList, BGCI GlobalTreeSearch and BGCI’s PlantSearch. In addition we sometimes need further information such as:

detailed_IUCN_redlist: IUCN redlist information enriched with POWO ID and geography (where possible).
wcvp: World checklist of vascular plants (WCVP) information where each record is linked to IUCN red list and GlobalTreeSearch (where possible).
wgsrpd3: Geographic information of botanic countries used in geographic graphics.
coordinates: coordinates of the collections location (used to determine native plants).

Within the BGsmartR package we provide an example enriched dataset in BGSmartR::enriched_collection_example which can be used to test the outputs of all the reports.

Note that some of the reports create geographic maps of plant distributions to do this we require the geographical information contained in World Geographical Scheme for Recording Plant Distributions, this can be obtained from POWO (link) and loaded into your environment. Or you can use the rWCVPData package to load the data.

Moreover, to create some of the reports further information is required such as:

wcvp: World checklist of vascular plants enriched with further information.
detailed_IUCN_redlist: IUCN redlist information with history and selected WCVP information.
endemic_species_per_region: The number of endemic species found in each geographic region.
accepted_species_per_region: The number of endemic species found in each geographic region.
tree_species_per_region: The number of endemic species found in each geographic region.

Versions of these objects can be found at (these are from early 2024 and do not contain any newer information, in .rda files):

Disclaimer: We do not claim ownership of the data linked above. This information is publicly available and shared for informational purposes only. If you are the rightful owner of any content and would like it removed, please contact us at jjp68@cam.ac.uk or message on the github repo and we will address your request promptly.

enriched_report = BGSmartR::enriched_collection_example
collection = 'My collection'
coordinates = c(52.19376551784332, 0.12777705055343996) # set to CUBG.
output_dir = getwd()
wgsrpd3 = rWCVPdata::wgsrpd3 # Use rWCVPData package to get wgsrpd3 information.
# Need to download linked data and load into R environment (otherwise some reports will fail)

BGSmartR::create_trends_report(enriched_report = enriched_report,
                               collection = collection,
                               coordinates = coordinates,
                               wgsrpd3 = wgsrpd3,
                               output_dir =paste0(output_dir, '/trends'),
                               min_year = 1970)

create_turnover_report(enriched_report = enriched_report,
                       collection = collection,
                       coordinates = coordinates,
                       wgsrpd3 = wgsrpd3,
                       min_year = 1970,
                       output_dir =paste0(output_dir, '/turnover'))

BGSmartR::create_geography_report(enriched_report = enriched_report,
                                  collection = collection,
                                  wgsrpd3 = wgsrpd3,
                                  wcvp = wcvp,
                                  detailed_IUCN_redlist = detailied_IUCN,
                                  output_dir =paste0(output_dir, '/geography'),
                                  endemic_species_per_region = endemic_species_per_region,
                                  accepted_species_per_region = accepted_species_per_region,
                                  tree_species_per_region = tree_species_per_region,
                                  do_download = F)

create_threatened_report(enriched_report = enriched_report,
                                   collection = collection,
                                   coordinates = coordinates,
                                   wgsrpd3 = wgsrpd3,
                                   detailed_IUCN_redlist = detailied_IUCN,
                                   wcvp = wcvp,
                                   output_dir =paste0(output_dir, '/threatened'))

BGSmartR::create_duplication_report(enriched_report = enriched_report,
                                    collection = collection,
                                    output_dir =paste0(output_dir, '/duplication_rareity'))

create_native_report(enriched_report = enriched_report,
                     collection = collection,
                     coordinates = coordinates,
                     wgsrpd3 = wgsrpd3,
                     wcvp = wcvp,
                     output_dir =paste0(output_dir, '/native'))

BGSmartR::create_taxonomic_diversity_report(enriched_report = enriched_report,
                                            collection = collection,
                                            wcvp = wcvp,
                                            output_dir =paste0(output_dir, '/taxonomic_diversity'))

BGSmartR::create_overview_report(enriched_report = enriched_report,
                                 collection = collection,
                                 wgsrpd3 = wgsrpd3,
                                 coordinates = coordinates,
                                 output_dir =paste0(output_dir, '/overview'))

BGSmartR::create_data_health_report(enriched_report = enriched_report,
                                    collection = collection,
                                    wgsrpd3 = wgsrpd3,
                                    coordinates = coordinates,
                                    output_dir =paste0(output_dir, '/data_health'))

BGSmartR::create_trees_report(enriched_report = enriched_report,
                              collection = collection,
                              coordinates = coordinates,
                              wgsrpd3 = wgsrpd3,
                              wcvp = wcvp,
                              output_dir =paste0(output_dir, '/trees'))

BGSmartR::create_endemic_report(enriched_report = enriched_report,
                                collection = collection,
                                coordinates = coordinates,
                                wgsrpd3 = wgsrpd3,
                                wcvp = wcvp,
                                output_dir =paste0(output_dir, '/endemic'))

BGSmartR::create_sustainability_report(enriched_report = enriched_report,
                                       collection = collection,
                                       wgsrpd3 = wgsrpd3,
                                       coordinates = coordinates,
                                       output_dir =paste0(output_dir, '/sustainability'))

As the reports are relatively large (generally ~30MB) they are not included in the package however you can access the outputs of the above code from the following link, BGsmartR example reports.