Data Format and Submission
To account for the continuous growth of the bio-optical data set and the wide variety of supported data types, the NASA Ocean Biology Processing Group felt it essential to develop efficient data ingestion and storage techniques. While this requires a specific data file format, the data protocols were designed to be as straightforward and effortless as possible on the part of the contributor, while still offering a useful format for internal efforts. The system was intended to meet the following conditions: simple data format, easily expandable and flexible enough to accommodate large data sets; global portability across multiple computer platforms; and web accessible data holdings with sufficient security to limit access to authorized users.
Data Submission Policy
All data collected under the auspices of the NASA Ocean Biology and Biogeochemistry (OB&B) program are to be submitted to SeaBASS within 1-year of the date of collection. For further details, please review the SeaBASS Access Policy.
Please consider the following while preparing data sets:
- SeaBASS data files are flat, two-dimensional ASCII text files that are a approved NASA Earth Science Data format by NASA ESDIS (Earth Science Data and Information Systems).
- Data are presented as a matrix of values, much like a spreadsheet.
- Columns may be delimited by spaces, tabs, or commas.
- Use consistent field delimiters.
- Data are preceded by a series of predefined metadata headers.
- The headers provide descriptive information about the data file, e.g., cruise name, date, and cloud cover.
- All SeaBASS field names and units (e.g., CHL, AOT) have been standardized.
- SeaBASS field names and units are not case sensitive.
- Headers may be arranged in any order provided that the first and last are /begin_header and /end_header.
- Most headers are required. Use a value of NA (not applicable) if information is unavailable.
- Use numeric blanks, such as -9999, for missing data. If applicable, separate values should be defined for measurements that were /above_detection_limit or /below_detection_limit
- List latitude in decimal degrees, with coordinates north of the equator positive and south negative.
- List longitude in decimal degrees, with coordinates east of the Prime Meridian positive and west negative.
- List times in GMT (UTC).
- Acceptable combinations of time to be reported in the data matrix of the file are: date/time, year/month/day/hour/minute/second, year/month/day/time, or date/hour/minute/second.
- Year/sdy/hour/minute/second and year/sdy/time are also supported but not encouraged.
- If precision to the nearest second was not measured, please report seconds as top of the minute (00).
- Header entries for date, time, and location headers are the extreme value for the file (e.g., farthest north, the date and time of the first measurement, etc).
- Only the time and location headers require bracketed () units. No other headers should include brackets.
- Headers should not include any white space. Separate words with an underscore. The only exception is for comment lines (beginning with !) which are allowed to contain spaces.
° Intermediate data: submit intermediate products that were calculated as part of another reported value. An important example is submitting absorbance (i.e. optical density) measurements in spectrophotometer files along with the calculated absorption coefficients.
° Replicates and uncertainty: Submit information about uncertainty and/or replicates whenever applicable, typically as columns of standard deviation (e.g., <measurement>_sd, like "chl_sd"), and bincount (plain bincount, or <measurement>_bincount if mulitple bincount columns are required. Other forms of uncertainty reporting are accepted for cases where they are more appropriate than sd. Contact SeaBASS staff if you have other questions about preserving raw data or uncertainty.
Note: some measurement protocols for measurements of discrete water samples call for samples to be calculated from multiple scans or filters (such as for extracted Chl or QFT). SeaBASS reporting convention calls for such samples to be reported as a single row of data, along with the standard deviation, and bincount. Replicate measurements can and should be reported separately, but are defined and assumed to have been created from their own set of multiple scans.
° Level of data processing: Generally speaking, data should be calibrated, depth-adjusted (i.e. adjust depths based on any differences between sensors and the pressure transducer on the package), unbinned, with QAQC applied (i.e. bad data thrown out). Data submissions should be accompanied by the relevant calibration files and a description of the processing and analysis that went into producing output (see documentation requirements below). We recognize there might be situations where binning or other differences are necessary or appropriate. Please contact the SeaBASS administrator with questions.
° File names:
- File names must not contain spaces or special characters except for hyphens, underscores, and periods.
- File name suffixes are recommended to be ".sb". However, other suffixes such as .txt, .csv, or .dat are acceptable.
- File names are recommended to end in R#, where # is the release number starting with 1 for the first final version (e.g., myfile_R1.sb). If files have data_status=preliminary (i.e., the data are still being analyzed and it is highly likely they will be revised in the future) then it is recommended they are labeled "_R0" to indicate that.
- File names must be unique within a submission, and it is strongly recommended they are formed using descriptive patterns incorporating information or abbreviations of the measurement type, cruise name, date, depth or other information. Using a file naming pattern like <EXPERIMENT>-<CRUISE>-<DATATYPE>_<YYYYMMDDHHMM_<RELEASE#>.sb has the benefits of generating unique file names that sort nicely within a directory, and also allows users to quickly understand their general contents at a glance. As hypothetical examples:
naames-naames_3-hplc_2017091512_R0.sb (example of a preliminary file, noted by "_R0")
naames-naames_3-hplc_2017091512_R1.sb (example of "final" release/version, i.e., R1)
naames-naames_3-hplc_2017091512_R2.sb (example of release 2 of a data file, i.e., it has been revised once)
tara-azores_laurient_acs-apcp_inline_201204300101_R1.sb (another example of the first release/version of a data file)
The following requirements must be met:
° Supporting documentation and calibration files must be included in the submission.
° The documentation and calibration files must match those listed in the appropriate headers in each file.
° The OBPG requires that documentation include cruise and instrument reports or logs.
° Please refer to the Documentation Guidelines for more detailed instructions including a template and several example documents.
Format Checking and Submission
- The OBPG maintains feedback software, FCHECK, to evaluate the format of data files.
- All data files should be tested using FCHECK prior to submission to SeaBASS.
- Data files, calibration files, and documentation are submitted to SeaBASS via SFTP (SSH File Transfer Protocol).
- The contributor should upload files using an SFTP client of choice. NASA does not endorse any particular SFTP client.
- Note: a username and a SSH key pair are required to access the SFTP site. Contact the SeaBASS Administrator to establish access. For instructions on this process see the Setting up SFTP Access documentation.
- Once an SFTP account is established you will be assigned a username. Use SFTP software of your choice to connect to the following link:
(substitute your personal username where that link says "yourusername". No password is explicitly needed beyond your SSH keys.
- Upon connecting, you will find two directories on the SeaBASS SFTP server: FCHECK and data_submission.
- The FCHECK directory is a location where a batch of SeaBASS files may be bulk checked via FCHECK. For details see the FCHECK documentation.
- The data_submission directory is to be used to upload and submit data and required documentation to the SeaBASS archive. Data MUST be placed in this directory to be considered as a submission to SeaBASS.
- Subdirectories should be created within the data_submission folder to organize data by project (i.e. - experiment or cruise). An additional layer of subdirectories should also be used within the project directories to contain all supporting documentation.
- Within 24 hours after submitting files, an automated receipt will be emailed to the contact listed in the files' metadata header. If you do not receive a receipt, please contact the SeaBASS Administrator.
- The SeaBASS Administrator will collect the files and evaluate the data set, contacting the submitters with any questions about the data or documentation.
- Once the data are archived, the SeaBASS Administrator will update the new data page and contact the data submitters with a final confirmation.
Setting up SFTP Access
- An individual account must be registered to submit data using the SeaBASS SFTP server.
- In order to gain access to the SFTP server, we require a copy of your public SSH key, generated with key type ED25519 (file name id_ed25519.pub) or key type ECDSA (file name id_ecdsa.pub). We prefer the ED25519 key type if your system has a modern version of OpenSSH.
- Please email 1) your public ED25519 or ECDSA SSH key, 2) your first and last name, 3) your affiliation/institution name, and 4) your email address that will be linked to your SFTP account to the SeaBASS Administrator. Then, an individual SFTP account will be created and access for data submission and bulk FCHECK requests will be granted.
- Note: If you need to submit data from different computers, it is acceptable to send additional public keys and request that they are all linked to your SeaBASS SFTP account.
- Instructions on generating ED25519 or ECDSA SSH keys may be found here for Windows, Mac, Linux or BSD platforms.
Generating a SSH key
For Mac, Linux or BSD:
There are two options to genereate a SSH key to be used for SFTP access to submit data to SeaBASS.
There are many options to genereate a public SSH key to be used for SFTP access to submit data to SeaBASS via Windows.
- The SSH key must be of type ED25519.
- This page offers one example of how to generate a public SSH key on Windows via an open-source software solution.
- Once you have generated a public ED25519 SSH key, copy the entire public key string into a text file called "id_ed25519.pub" and email your public ED25519 SSH key, your first and last name, your affiliation (insitution and PI names), and your email address that will be linked to your SFTP account to the SeaBASS Administrator.
- Note: this is the email address that MUST be used to request a batch check of SeaBASS files via FCHECK.