Implementing Logging with Kafka and Elastic Stack#
Overview#
Echodataflow integrates with Kafka to send worker messages to a Kafka topic, which can then be routed to any logging destination. In this example, we will set up an Elastic Stack (Elasticsearch, Logstash, Kibana, Beats) with Kafka integration to analyze and visualize logs in real-time.
Setting Up the Infrastructure#
Dockerized Elastic Stack with Kafka Integration
This setup includes:
Elasticsearchfor powerful search and data analyticsLogstashfor log processing and ingestion into ElasticsearchKibanafor data visualization with dashboards and analyticsApache Kafkawith Zookeeper for reliable log streamingSchema Registryfor managing Kafka data schemasConfluent Control Centerfor Kafka cluster management and monitoringMetricbeatfor monitoring system metrics and Docker containers
Prerequisites Ensure you have Docker and Docker Compose installed on your machine. A basic understanding of Docker, the ELK Stack, and Apache Kafka is recommended.
Quick Start#
Clone the Repository
git clone https://github.com/Sohambutala/StreamLogging.git
cd StreamLogging
Start the Services
docker-compose up -d
# This command starts all the services defined in the docker-compose.yml file in detached mode.
Access the Services
Kibana: Open http://localhost:5601 to access Kibana’s web interface.Confluent Control Center: Visit http://localhost:9021 to manage and monitor your Kafka cluster.Elasticsearch: Accessible at http://localhost:9200.
Monitor with Metricbeat
Metricbeat is configured to collect metrics from Docker containers and the host system. Check the Metricbeat dashboards in Kibana for insights.
Configuration#
You can customize the configurations for each service by modifying their respective configuration files in the repository:
Elasticsearch: ./elasticsearch/elasticsearch.ymlLogstash: ./logstash/config/logstash.yml and the pipeline files in ./logstash/pipeline/Kibana: ./kibana/kibana.ymlMetricbeat: ./metricbeat/metricbeat.yml
Stopping the Services#
To stop all services and remove containers, networks, and volumes created by docker-compose up, run:
docker-compose down -v
Configuring Kafka Logging in Echodataflow#
Configure your Kafka server in the datastore.yaml file for echodataflow.
Example datastore.yaml#
name: Bell_M._Shimada-SH1707-EK60
sonar_model: EK60
raw_regex: (.*)-?D(?P<date>\w{1,8})-T(?P<time>\w{1,6})
args:
urlpath: s3://ncei-wcsd-archive/data/raw/Bell_M._Shimada/SH1707/EK60/{{file_name}}.raw
parameters:
file_name: Summer2017-D20170623-T233948
storage_options:
anon: true
json_export: true
output:
urlpath: ./echodataflow-output
retention: true
overwrite: true
# Kafka Config
logging:
kafka:
topic: echodataflow_logs
servers:
- localhost:9092