Datastore Configuration#
In this section, we’ll delve into the configuration that defines how the data will be organized and managed for processing. This configuration is provided in YAML format and plays a crucial role in structuring data inputs and outputs.
Here’s the detailed breakdown of the configuration:
name: Bell_M._Shimada-SH1707-EK60
sonar_model: EK60
raw_regex: (.*)-?D(?P<date>\w{1,8})-T(?P<time>\w{1,6})
args:
urlpath: s3://ncei-wcsd-archive/data/raw/{{ ship_name }}/{{ survey_name }}/{{ sonar_model }}/*.raw
parameters:
ship_name: Bell_M._Shimada
survey_name: SH1707
sonar_model: EK60
storage_options:
anon: true
group:
file: ./EK60_SH1707_Shimada.txt
group_name: 2017
json_export: true
output:
urlpath: ./echodataflow-output
retention: false
overwrite: true
Note:
For a more comprehensive understanding of each option and its functionality, you can refer to the Datast documentation.
The pipeline will store Target Strength output under
./echodataflow-output. As the retention is set to false, only Target Strength files will be stored. To specify files for processing, create a list of file names and store it inEK60_SH1707_Shimada.txt, which should be placed under the transect directory.
This configuration facilitates efficient data organization and management for the processing pipeline. Feel free to tailor it to your specific data and processing requirements.