Plankton and particles imagery (IFCB, UVP)

The guidelines presented below will assist the user in submitting imaging-in-flow or submersible microscopic data to SeaBASS. New metadata headers and field names have been developed for plankton and other particle data. Additionally, detailed instructions are given for the submission of documentation (protocol document, checklist, etc.), images, and additional ancillary files. The submitter must indicate in the checklist whether all ROIs in a sample were annotated. Greater detail regarding the development and application of these guidelines and requirements can be found in Neeley et al., 2021.

 
**Neeley, A., S. Beaulieu, C. Proctor, I. Cetinic, J. Futrelle, I. Soto Ramos, H. Sosik, E. Devred, L. Karp-Boss, M. Picheral, N. Poulton, C. Roesler, and A. Shepherd. (2021) Standards and practices for reporting plankton and other particle observations from images. 38 pp. DOI:10.1575/1912/27377 


Table of Contents


Required documents, plankton and other particles

In addition to the standard headers and fields required for all SeaBASS submissions, the submission of plankton and other particle observations requires the following additional headers and fields (additional details can be found in subchapter 2.3 of Neeley et al., 2021). The following sections will describe special metadata headers and fields, protocol and checklist documents, and how to submit data from non-conforming ROI's (e.g., non-living organisms and non-identifiable particles) using a YAML file, supplementary lists of which taxonomic categories were assessed by manual and/or automatic classification methods if not every ROI in a given datafile was classified, and the raw images and/or any relevant instrument metadata on which the Level 1b files are based. Please use the list below as a guideline for the submission requirements.

Submission checklist:
  1. SeaBASS file using the appropriate metadata headers and field names
  2. Protocol document
  3. Data set checklist
  4. Supplemental Non-conforming ROI YAML file (if applicable)
  5. Raw images (associated archives)
  6. Ancillary data files (for image processing)
  7. List of assessed IDs for automated and/or manual classification

"Conditionally required" metadata headers

In addition to the standard headers and fields required for all SeaBASS submissions, the submission of flow cytometry observations requires the following additional headers and fields (additional details can be found in subchapter 2.3 of Neeley et al., 2021). 
 
      Required metadata headers:
  1. volume_sampled_ml (ml): original volume of sample collected in units of milliliters
  2. volume_imaged_ml (ml): subset of volume_sampled_ml that was imaged in units of milliliters
  3. pixel_per_um (um): number of pixels per unit length in units of micrometers
  4. eventID (none): a unique identifier associated with the sample as an event
  5. /associated_archives: List of filenames for all external raw data (e.g., a bundle of images). Please split files that exceed 5Gb. Example: /associated_archives=EXPERIMENT_CRUISE_IFCB_images_associated.tgz
  6. /associated_archive_types: Provide a value or list of text terms to describe the contents of each associated_archive file. Example: /associated_archive_types=planktonic
      Required "if applicable" metadata headers:
  1. length_representation_instrument_varname (um): the instrument’s variable name equivalent to ‘length_representation’ (e.g., maxFeretDiameter).
  2. width_representation_instrument_varname (um): the instrument’s variable name equivalent to ‘width_representation’ (e.g., minFeretDiameter).
  3. /associated_files: The value is the name of the specific source file used for the scientific analysis. Include this header if all the data in this file came from the same images. However, if multiple samples were used, then skip this header and instead use the equivalent field associated_files so you can name the specific file on each data row. 
  4. /associated_file_types: This header must be used in conjunction with associated_files. The entry here should describe the data type within the associated_files. For the purposes of IFCB and UVP images, please use "planktonic". 
      Optional metadata header:
  1. associatedMedia_source (none): a unique persistent URL pointing to the landing page for a water sample from which multiple ROIs are derived

"Conditionally required" data fields

The next section describes how to format the data table for plankton and other particle observations. An external, machine-readable, and resolvable identifier, or object number, that returns nomenclatural (not taxonomic) details of a name (scientificNameID) should also be included for each ROI (not applicable to non-living particles).

      Required fields:
  1. associatedMedia (none): a unique persistent identifier of the media associated with the occurrence. The field provides the unique imagery file name corresponding to the source of the ROI or a URL pointing to a permanent landing page for the ROI image.
  2. biovolume (um^3): Biovolume for the target detected within the ROI determined by means specified in the biovolume calculation method or protocol document.
  3. area_cross_section (um^2): Cross-sectional area of the target detected within the ROI determined by means specified in the image processing method or protocol document.
  4. length_representation (um): Representation of the length of the target detected within the ROI or largest mesh size for which the target could be retained, determined by means specified in the image processing method or protocol document.
  5. width_representation (um): Representation of the width of the target detected within the ROI or smallest mesh size through which the target could pass, determined by means specified in the image processing method or protocol document.
      Required "if applicable" fields:
  1. data_provider_category_automated (none): A category used by the data provider to name the organism or particle for an automated classification, not necessarily a scientific name (e.g., pennate or detritus).
  2. scientificName_automated (none): A scientific name from a recognized taxonomic reference database (e.g., WoRMS, AlgaeBase) at the lowest level that matches the data provider's category for an automated classification paired to a scientificNameID. Generally, the ROI corresponds to an occurrence assigned to a single taxonomic name, e.g., Karenia brevis
  3. scientificNameID_automated (none): A life science identifier (LSID) from a recognized taxonomic reference database (e.g., WoRMS, AlgaeBase) at the lowest level that matches the data provider's category for automated classification. e.g., urn:lsid:marinespecies.org:taxname:233015 where ‘urn:lsid’ indicates the ID that is specific to life science data and is used for all files, marinespecies.org is the URL for the reference database WoRMS, and the namespace ‘taxname’ informs the user that the following number represents a unique numerical identifier or taxon identifier in WoRMS. 233015 represents the taxon identifier (AphiaID) in WoRMS for the dinoflagellate species Karenia brevis.
  4. data_provider_category_manual (none): A category used by the data provider to name the organism or particle for a manual identification, not necessarily a scientific name.
  5. scientificName_manual (none): A scientific name from a recognized taxonomic reference database (e.g., WoRMSAlgaeBase) at the lowest level that matches the data provider's category, for a manual identification matched to
  6. scientificNameID. Generally, the ROI corresponds to an occurrence assigned to a single taxonomic name.
  7. scientificNameID_manual (none): A life science identifier from a recognized taxonomic reference database (e.g., WoRMSAlgaeBase) at the lowest level that matches the data provider's category for manual identification.
  8. /associated_files: The value is the name of the specific IFCB or UVP image used for the scientific analysis. 
  9. /associated_file_types: This header must be used in conjunction with associated_files. The entry here should describe the data type within the associated_files. For the purposes of IFCB and UVP imagery, please use "planktonic".      
      Recommended or Optional fields:
  1. equivalent_spherical_diameter (um) Equivalent spherical diameter of the target detected within the ROI determined by means specified in the image processing method or protocol document.
  2. area_based_diameter (um) Area-based diameter of the target detected within the ROI determined by means specified in the image processing method or protocol document. 

Protocol and checklist documents

Submission of data files containing plankton and other particle observations requires supplemental documentation, including both a checklist and a protocol document, to preserve critical methods information. Please, download the checklist and protocol templates, fill them out, and submit them along with your other documentation and calibration information. Example submission files can be found below.

Download the checklist template and example:
Download plankton and other particles checklist template (V20210325)
Download plankton and other particles checklist example (V20210325)

Download the protocol template and example:
Download a protocol template (V20210325)
Download a protocol example (V20210325)

Non-conforming ROIs (YAML file)

Some ROIs may be defined as ‘non-conforming’, meaning that they are either not identifiable (e.g., blurry images or image artifacts) or are non-living (e.g., bead, bubble, detritus, etc.). To facilitate the identification of non-conforming ROIs, custom definitions not found in a taxonomic authority must be provided in an external document file. A Phytoplankton Taxonomy Working Group (“PTWG”) custom namespace was created to define several standardized names for common terms that are not currently defined by WoRMS or Algae Base. As of March 2021, this includes: 'bad_image', ‘bead’, ‘bubble’, ‘detritus’, "fecal_pellet', and ‘other’. The term ‘other’ should only be used to describe a non-living particle. Optionally, non-conforming ROIs defined in the "PTWG" namespace may be supplemented by more specific higher-level definitions. For example, ‘MYNICKNAME:opaque_detritus’ could be used as a scientificNameID to enhance 'ptwg:detritus'.


Custom terms must be defined in a separate supplemental external plain-text namespace file using YAML format with each term containing ‘id’, ‘definition’, and ‘associated_terms’. If the data provider is confident that the ROI is a living particle but cannot be identified to a specific taxonomic rank, then it should be classified to the rank of Eukaryota or Prokaryota. These IDs are paired with definitions and are stored in a YAML formatted file in order to serve as a machine-readable configuration file for anyone working with the data files. To use the terms, combine the ‘prefix’ of the namespace (i.e., ptwg) with a given ID in the scientificNameID column, for example ‘ptwg:bead’ or ‘ptwg:detritus’. The relevant scientificName and (if present) the recommended data_provider_category columns should be filled with the ID value (e.g., ‘detritus’). If a submission uses the PTWG namespace, download the PTWG file and include it as part of the submission documents (In Neeley et al., 2021, subchapters 2.4.1 - 2.4.4).

The PTWG YAML file can be found here.
Download a supplemental YAML file template (V20210325)
Download a supplemental YAML file example (V20210325)

Submission of images and ancillary data files

Data submissions should include an organized directory containing the raw images and any relevant instrument metadata on which the Level 1b files are based. These should be provided even if a version of the annotated images is hosted at another repository, such as EcoTaxa. SeaBASS will create a compressed tar file with the raw images that are available for these source files. If the submission is extremely large, then it should be split to create more reasonable sizes (e.g., by year if a long time series). The filenames of the tar bundles must be provided in the metadata header called /associated_archives=. If the raw images are hosted externally, then this localized directory may either contain the individual images or alternately a more efficient data format (for example, sample-level endpoint files for IFCB data). If that header contains multiple values, then list them in a comma-separated format with no spaces.

In addition to the raw images, ancillary file sets necessary for processing raw images must be submitted. For the McClane Labs Imaging FlowCytobot this would include the roi, hdr, and adc files. For the Yokogawa FlowCAM, this would include the lst, ctx, edg and tif image files. 

Assessed IDs list

Providing a list of all scientificName/scientificNameID pairs assessed by the automated classifier with the data submission enables the determination of both the presence and absence of annotations in the Level 1b file. Supplementary lists of which taxonomic categories were assessed by manual and/or automatic classification methods are strongly recommended and are required as part of data submissions if not every ROI in a given data file was classified. If every ROI was not classified, these lists are essential for the downstream creation of summary products involving the concentrations of phytoplankton taxa.

When every ROI is classified, these lists are useful for determining absence. These lists may be specific to a given water sample or data file, e.g., if only diatoms are classified in a sample, or they may be comprehensive of every class in a classifier. Two lists are required for a given data file: all LSIDs assessed for ‘automated’ and ‘manual’ annotation where automated refers to automatic classification while manual refers to manual annotation. To link a given SeaBASS data file to its associated lists, include the header associated_files as its value, and provide the names of any associated files as a comma-separated list (no spaces).

For example: /documents=protocol_plankton_and_particles_NESLTER_IFCB102.txt,checklist_plankton_and_particles_NESLTER_IFCB102.txt,namespace_ ptwg_nonconforming_roi_v1.txt,automated_assessed_id_D20180201T103729_IFCB102.txt,manual_assessed_id_D20180201T103729_IFCB102.txt

The following table is an example of an assessed ID table submission. In this example of manual annotation, all ROIs in the sample were annotated with the genera Chaetoceros and Strombodium. Although, the data provider column listed nine different categories of Chaetoceros, the data provider was only confident to the genus for eight of those nine categories and to the species level in one category (Chaetoceros socialis) as indicated by the scientificNameID in the third column. In contrast, the data provider was confident in seven Strombidium categories to species level in three categories as indicated by the scientificNameID.

data_provider_category_manual  

scientificName_manual  

scientificNameID_manual  

Chaetoceros  

Chaetoceros  

urn:lsid:marinespecies.org:taxname:148985  

Chaetoceros concavicornis  

Chaetoceros  

urn:lsid:marinespecies.org:taxname:148985  

Chaetoceros curvusetus  

Chaetoceros  

urn:lsid:marinespecies.org:taxname:148985  

Chaetoceros danicus  

Chaetoceros  

urn:lsid:marinespecies.org:taxname:148985  

Chaetoceros debilis  

Chaetoceros  

urn:lsid:marinespecies.org:taxname:148985  

Chaetoceros didymus  

Chaetoceros  

urn:lsid:marinespecies.org:taxname:148985  

Chaetoceros peruvianis  

Chaetoceros  

urn:lsid:marinespecies.org:taxname:148985  

Chaetoceros socialis  

Chaetoceros socialis  

urn:lsid:marinespecies.org:taxname:149123  

Chaetoceros subtilis  

Chaetoceros  

urn:lsid:marinespecies.org:taxname:148985  

Strombidium capitatum  

Strombidium capitatum  

urn:lsid:marinespecies.org:taxname:101282  

Strombidium conicum  

Strombidium conicum  

urn:lsid:marinespecies.org:taxname:101289  

Strombidium inclinatum  

Strombidium  

urn:lsid:marinespecies.org:taxname:101195  

Strombidium oculatum  

Strombidium tintinnodes  

urn:lsid:marinespecies.org:taxname:101337  

Strombidium sp1  

Strombidium  

urn:lsid:marinespecies.org:taxname:101195  

Strombidium sp2  

Strombidium  

urn:lsid:marinespecies.org:taxname:101195  

Strombidium wulffi  

Strombidium  

urn:lsid:marinespecies.org:taxname:101195 

Taxonomic lookup table

A taxonomic lookup table is essential to ensure the accurate pairing of data provider categories, the categories used by the data provider to name the organism or particle for an automated classification (not necessarily a scientific name, e.g., pennate or detritus), to their scientificName and scientificNameID. The scientificName/scientificNameID pairs can be determined manually by searching WoRMS or automatically using web services with a script or with the WoRMS Taxon Match Graphical User Interface (GUI). When scientificName/scientificNameID pairs have been determined manually, we recommend confirming that each scientificName/scientificNameID pair is accepted in WoRMS either using the GUI or by using an automated workflow in a script. When using web services to determine the scientificName/scientificNameID pairs, some manual cleanup may be required to ensure the correct scientificName/scientificNameID pairs are provided. Using web services can also correct a misspelled scientificName and retrieve hierarchical ranks. Available scripts include the R package ‘worms’ and the R package ‘taxize’**. Further guidance for creating a lookup table can be found in Neeley et al., 2021, subchapters 2.2.1 and 2.2.2.

**Disclaimer: Software package examples listed here do not constitute an endorsement or recommendation by NASA.

Example submission, plankton and other particles

Example SeaBASS data file. Note that the example file doesn't include the optional fields.
(Example SeaBASS file)
Last edited by Chris Proctor on 2023-03-07
Created by Chris Proctor on 2023-02-09