Datastore Configuration

Datastore Configuration#

In this section, we’ll delve into the configuration that defines how the data will be organized and managed for processing. This configuration is provided in YAML format and plays a crucial role in structuring data inputs and outputs.

Here’s the detailed breakdown of the configuration:

name: Bell_M._Shimada-SH1707-EK60
sonar_model: EK60 
raw_regex: (.*)-?D(?P<date>\w{1,8})-T(?P<time>\w{1,6}) 
args: 
  urlpath: s3://ncei-wcsd-archive/data/raw/{{ ship_name }}/{{ survey_name }}/{{ sonar_model }}/*.raw
  parameters:
    ship_name: Bell_M._Shimada
    survey_name: SH1707
    sonar_model: EK60
  storage_options:
    anon: true
  group:
    file: ./EK60_SH1707_Shimada.txt
  group_name: 2017
  json_export: true 
output: 
  urlpath: ./echodataflow-output
  retention: false
  overwrite: true

Note:

For a more comprehensive understanding of each option and its functionality, you can refer to the Datast documentation.
The pipeline will store Target Strength output under ./echodataflow-output. As the retention is set to false, only Target Strength files will be stored. To specify files for processing, create a list of file names and store it in EK60_SH1707_Shimada.txt, which should be placed under the transect directory.

This configuration facilitates efficient data organization and management for the processing pipeline. Feel free to tailor it to your specific data and processing requirements.