Plankton and other Particles (IFCB, UVP)

The guidelines presented below will assist the user in submitting imaging-in-flow or submersible microscopic data to SeaBASS. New metadata headers and fieldnames have been developed for plankton and other particle data. Additionally, detailed instructions are given for submission of documentation (protocol document, checklist, etc.), images and additional ancillary files. The submitter must indicate in the checklist whether all ROIs in a sample were annotated. Greater detail regarding the development and application of these guidelines and requirements can be found in Neeley et al., In Press.

 
**Neeley, A., S. Beaulieu, C. Proctor, I. Cetinic, J. Futrelle, I. Ramos-Santos, H. Sosik, E. Devred, L. Karp-Boss, M. Picheral, N. Poulton, C. Roesler, and A. Shepherd. (In Press) Standards and practices for reporting plankton and other particle observations from images. XX pp. DOI TBD


Table of Contents


Required documents, plankton and other particles

In addition to the standard headers and fields required for all SeaBASS submissions, the submission of plankton and other particle observations requires the following additional headers and fields (additional details can be found in subchapter 2.3 of Neeley et al., In Press). The following sections will describe special metadata headers and fields, protocol and checklist documents, and how to submit data from non-conforming ROI's (e.g., non-living organisms and non-identifiable particles) using a YAML file, supplementary lists of which taxonomic categories were assessed by manual and/or automatic classification methods if not every ROI in a given datafile was classified, and the raw images and/or any relevant instrument metadata on which the Level 1b files are based. Please use the list below as a guideline for the submission requirements.
 
Submission checklist:
  1. Data file that includes data table and appropriates metadata headers and field names
  2. Protocol document 
  3. Data set checklist
  4. Supplemental Non-conforming ROI YAML file (if applicable)
  5. Raw images  
  6. Ancillary data files (for image processing) 
  7. List of assessed IDs for automated and/or manual classification 

"Conditionally required" metadata headers

Required metadata headers:
  1. volume_sampled_ml (ml): original volume of sample collected in units of milliliters
  2. volume_imaged_ml (ml): subset of volume_sampled_ml that was imaged in units of milliliters
  3. pixel_per_um (um): number of pixels per unit length in units of micrometers
  4. eventID (none): a unique identifier associated with the sample as an event
Required "if applicable" metadata headers:
  1. length_representation_instrument_varname (um): the instrument’s variable name equivalent to ‘length_representation’ (e.g., maxFeretDiameter).  
  2. width_representation_instrument_varname (um): the instrument’s variable name equivalent to ‘width_representation’ (e.g., minFeretDiameter).  
Optional metadata header:
  1. associatedMedia_source (none): a unique persistent URL pointing to the landing page for a water sample from which multiple ROIs are derived

"Conditionally required" data fields

The next section describes how to format the data table for plankton and other particle observations. An external, machine-readable and resolvable identifier, or object number, that returns nomenclatural (not taxonomic) details of a name (scientificNameID) should also be included for each ROI (not applicable to non-living particles).
 
Required fields:
  1. associatedMedia (none): a unique persistent identifier of the media associated with the occurrence. The field provides the unique imagery file name corresponding to the source of the ROI or a URL pointing to a permanent landing page for the ROI image. 
  2. biovolume (um^3): Biovolume for the target detected within the ROI determined by means specified in the biovolume calculation method or protocol document.
  3. area_cross_section (um^2): Cross-sectional area of the target detected within the ROI determined by means specified in the image processing method or protocol document.
  4. length_representation (um): Representation of length of the target detected within the ROI or largest mesh size for which the target could be retained, determined by means specified in the image processing method or protocol document.

  5. width_representation (um): Representation of width of the target detected within the ROI or smallest mesh size through which the target could pass, determined by means specified in the image processing method or protocol document. 

Required "if applicable" fields:
  1. data_provider_category_automated (none): A category used by the data provider to name the organism or particle for an automated classification, not necessarily a scientific name (e.g., pennate or detritus).
  2. scientificName_automated (none): A scientific name from a recognized taxonomic reference database (e.g., WoRMS, AlgaeBase) at the lowest level that matches the data provider's category for an automated classification paired to a scientificNameID. Generally, the ROI corresponds to an occurrence assigned to a single taxonomic name, e.g., Karenia brevis
  3. scientificNameID_automated (none): A life science identifier (LSID) from a recognized taxonomic reference database (e.g., WoRMS, AlgaeBase) at the lowest level that matches the data provider's category for an automated classification. e.g., urn:lsid:marinespecies.org:taxname:233015 where ‘urn:lsid’ indicates the ID that is specific to life science data and is used for all files, marinespecies.org is the url for the reference database WoRMS, and the namespace ‘taxname’ informs the user that the following number represents a unique numerical identifier or taxon identifier in WoRMS. 233015 represents the taxon identifier (AphiaID) in WoRMS for the dinoflagellate species Karenia brevis.
  4.  data_provider_category_manual (none): A category used by the data provider to name the organism or particle for a manual identification, not necessarily a scientific name.
  5.  scientificName_manual (none): A scientific name from a recognized taxonomic reference database (e.g., World Register of Marine Species, AlgaeBase) at the lowest level that matches the data provider's category, for a manual identification matched to scientificNameID. Generally, the ROI corresponds to an occurrence assigned to a single taxonomic name.
  6. scientificNameID_manual (none): A life science identifier from a recognized taxonomic reference database (e.g., World Register of Marine Species, AlgaeBase) at the lowest level that matches the data provider's category for a manual identification.
Recommended or Optional fields:
  1.  equivalent_spherical_diameter (um) Equivalent spherical diameter of the target detected within the ROI determined by means specified in the image processing method or protocol document.
  2.  area_based_diameter (um) Area-based diameter of the target detected within the ROI determined by means specified in the image processing method or protocol document. 

Protocol and checklist documents

Submission of data files containing plankton and other particle observations requires supplemental documentation, including both a checklist and a protocol document, to preserve critical methods information. Please, download the checklist and ptotocol templates, fill it out and submit it along with your other documentation and calibration information. Example submission files can be found below.  
 
Download checklist template and example:
Download plankton and other particles checklist template (V20210325)
Download plankton and other particles checklist example (V20210325)
 
Download protocol template and example:
Download a protocol template (V20210325)
Download a protocol example (V20210325)

Non-conforming ROIs (YAML file)

Some ROIs may be defined as ‘non-conforming’, meaning that they are either not identifiable (e.g., blurry images or image artifacts) or are non-living (e.g., bead, bubble, detritus, etc.). To facilitate identification of non-conforming ROIs, custom definitions not found in a taxonomic authority must be provided in an external document file. A Phytoplankton Taxonomy Working Group (“PTWG”) custom namespace was created to define several standardized names for common terms that are not currently defined by WoRMS or Algae Base. As of March 2021, this includes: 'bad_image', ‘bead’, ‘bubble’, ‘detritus’,"fecal_pellet', and ‘other’. The term ‘other’ should only be used to describe a non-living particle. Optionally, non-conforming ROIs defined in the "PTWG" namespace may be supplemented by more specific higher-level definitions. For example, ‘MYNICKNAME:opaque_detritus’ could be used as a scientificNameID to enhance 'ptwg:detritus'.

 
Custom terms must be defined in a separate supplemental external plain-text namespace file using YAML-format with each term containing ‘id’, ‘definition’, and ‘associated_terms’. If the data provider is confident that the ROI is a living particle but cannot be identified to a specific taxonomic rank, then it should be classified to the rank of Eukaryota or Prokaryota. These IDs are paired with definitions and are stored in a YAML formatted file in order to serve as a machine-readable configuration file for anyone working with the data files. To use the terms, combine the ‘prefix’ of the namespace (i.e., ptwg) with a given ID in the scientificNameID column, for example ‘ptwg:bead’ or ‘ptwg:detritus’. The relevant scientificName and (if present) the recommended data_provider_category columns should be filled with the ID value (e.g., ‘detritus’). If a submission uses the PTWG namespace, download the PTWG file and include it as part of the submission documents (In Neeley et al., In Press, subchapters 2.4.1 - 2.4.4).
 

The PTWG YAML file can be found here.

Download a supplemental YAML file template (V20210325)
Download a supplemental YAML file example (V20210325)

Submission of Images and ancillary data files

Data submissions should include an organized directory containing the raw images and any relevant instrument metadata on which the Level 1b files are based. These should be provided even if a version of the annotated images is hosted at another repository, such as EcoTaxa. SeaBASS will create a compressed tar file with the raw images that is optionally available for these source files. If the submission is extremely large, then it should be split to create more reasonable sizes (e.g., by year if a long time series). The name or names of the highest-level directory must be provided in the metadata header called /associated_files=. If the raw images are hosted externally, then this localized directory may either contain the individual images, or alternately a more efficient data format (for example, sample-level endpoint files for IFCB data). If that header contains multiple values, then list them in a comma-separated format with no spaces. 

 

In addition to the raw images, ancillary file sets necessary for processing raw images must be submitted. For the McClane Labs Imaging FlowCytobot this would include the roi, hdr and adc files. For the Yokogawa FlowCAM, this would include the lst, ctx, edg and tif image files.  

Assessed IDs list

Providing a list of all scientificName/scientificNameID pairs assessed by the automated classifier with the data submission enables the determination of both presence and absence of annotations in the Level 1b file. Supplementary lists of which taxonomic categories were assessed by manual and/or automatic classification methods are strongly recommended and are required as part of data submissions if not every ROI in a given datafile was classified. If every ROI was not classified, these lists are essential for the downstream creation of summary products involving the concentrations of phytoplankton taxa.
 
When every ROI is classified, these lists are useful for determining absence. These lists may be specific to a given water sample or datafile, e.g., if only diatoms are classified in a sample, or they may be comprehensive of every class in a classifier. Two lists are required for a given datafile: all LSIDs assessed for ‘automated’ and ‘manual’ annotation where automated refers to automatic classification while manual refers to manual annotation. To link a given SeaBASS datafile to its associated lists, include the header associated_files as its value, and provide the names of any associated files as a comma separated list (no spaces).
 
For example: /associated_files=Automated_assessed_id_D20180201T103729_IFCB102.txt,Manual_assessed_id_D20180201T103729_IFCB102.txt. The same associated file names may be referenced in multiple data files, so it is only necessary to create additional files if different categories were assessed for different data files. 

 

The following table is an example of an assessed ID table submission. In this example of manual annotation, all ROIs in the sample were annotated with the genera Chaetoceros and Strombodium. Although, the data provider column listed nine different categories of Chaetoceros, the data provider was only confident to the genus for eight of those nine categories and to the species level in one category (Chaetoceros socialis) as indicated by the scientificNameID in the third column. In contrast, the data provider was confident in seven Strombidium categories to species level in three categories as indicated by the scientificNameID.  

data_provider_category_manual 

scientificName_manual 

scientificNameID_manual 

Chaetoceros 

Chaetoceros 

urn:lsid:marinespecies.org:taxname:148985 

Chaetoceros concavicornis 

Chaetoceros 

urn:lsid:marinespecies.org:taxname:148985 

Chaetoceros curvusetus 

Chaetoceros 

urn:lsid:marinespecies.org:taxname:148985 

Chaetoceros danicus 

Chaetoceros 

urn:lsid:marinespecies.org:taxname:148985 

Chaetoceros debilis 

Chaetoceros 

urn:lsid:marinespecies.org:taxname:148985 

Chaetoceros didymus 

Chaetoceros 

urn:lsid:marinespecies.org:taxname:148985 

Chaetoceros peruvianis 

Chaetoceros 

urn:lsid:marinespecies.org:taxname:148985 

Chaetoceros socialis 

Chaetoceros socialis 

urn:lsid:marinespecies.org:taxname:149123 

Chaetoceros subtilis 

Chaetoceros 

urn:lsid:marinespecies.org:taxname:148985 

Strombidium capitatum 

Strombidium capitatum 

urn:lsid:marinespecies.org:taxname:101282 

Strombidium conicum 

Strombidium conicum 

urn:lsid:marinespecies.org:taxname:101289 

Strombidium inclinatum 

Strombidium 

urn:lsid:marinespecies.org:taxname:101195 

Strombidium oculatum 

Strombidium tintinnodes 

urn:lsid:marinespecies.org:taxname:101337 

Strombidium sp1 

Strombidium 

urn:lsid:marinespecies.org:taxname:101195 

Strombidium sp2 

Strombidium 

urn:lsid:marinespecies.org:taxname:101195 

Strombidium wulffi 

Strombidium 

urn:lsid:marinespecies.org:taxname:101195 

Taxonomic lookup table

 A taxonomic lookup table is essential to ensure the accurate pairing of data provider categories, the categories used by the data provider to name the organism or particle for an automated classification (not necessarily a scientific name, e.g., pennate or detritus), to their scientificName and scientificNameID. The scientificName/scientificNameID pairs can be determined manually by searching WoRMS or automatically using web services with a script or with the WoRMS Taxon Match Graphical User Interface (GUI) . When scientificName/scientificNameID pairs have been determined manually, we recommend confirming that each scientificName/scientificNameID pair is accepted in WoRMS either using the GUI or by using an automated workflow in a script. When using web services to determine the scientificName/scientificNameID pairs, some manual cleanup may be required to ensure the correct scientificName/scientificNameID pairs are provided. Using web services can also correct a misspelled scientificName and retrieve hierarchical ranks. Available scripts include the R package ‘worms’ and the R package ‘taxize’**. Further guidance for creating a lookup table can be found in Neeley et al., In Press, subchapters 2.2.1 and 2.2.2.  
 

**Disclaimer: Software package examples listed here do not constitute an endorsement or recommendation by NASA. 

Example submission, plankton and other particles

 Example SeaBASS data file. Note that the example file doesn't include the optional fields.
(Example SeaBASS file)
Last edited by Chris Proctor on 2021-07-26
Created by Chris Proctor on 2021-03-23