Dask Streams for Logging#

Echodataflow is configured to use Dask streams for logging by default. This ensures that logs generated by Dask workers are captured and can be redirected to a local file or any other logging destination configured in the logging YAML. By default if logging is not configured, all the worker messages are directed to application console.

Dask Streams Configuration#

In the above configuration, logs from Dask streams will be written to a local file (configured with a rotating file handler). However, unlike CloudWatch, the order of logs may not be preserved since logs are written once control returns from the Dask workers to the main application.

Example#

Configuring Dask Streams Logging#

Create a logging YAML file to configure dask streams logging for echodataflow. Below is an example configuration:

version: 1
disable_existing_loggers: False
formatters:
  json:
    format: "[%(asctime)s] %(process)d %(levelname)s %(name)s:%(funcName)s:%(lineno)s - %(message)s"
  plaintext:
    format: "[%(asctime)s] %(process)d %(levelname)s %(name)s:%(funcName)s:%(lineno)s - %(message)s"
handlers:
  logfile:
    class: logging.handlers.RotatingFileHandler
    formatter: plaintext
    level: DEBUG
    filename: echodataflow.log
    maxBytes: 1000000
    backupCount: 3
loggers:
  echodataflow:
    level: DEBUG
    propagate: False
    handlers: [logfile]

Integrating the Logging Configuration#

Finally, pass this YAML configuration file along with the dataset and pipeline configuration when initializing echodataflow.

Here is an example of how to integrate the logging configuration into your echodataflow initialization:

# Example initialization of echodataflow
dataset_config = 'path_to_dataset_config'
pipeline_config = 'path_to_pipeline_config'
logging_config = 'path_to_logging_config_yaml'
options = {}  # Add your options here

data = echodataflow_start(dataset_config=dataset_config, pipeline_config=pipeline_config, logging_config=logging_config, options=options)