Monitoring containerized (or not) environments is critical for maintaining the performance and reliability of modern applications. This blog post details the setup of a monitoring stack in Docker Swarm using Prometheus, Grafana, and InfluxDB. We'll focus on a practical example using NFS for shared storage, an overlay network for communication, and Swarm's orchestration capabilities. For privacy, paths and passwords have been anonymized.
Overview of the Stack
This stack provides the following functionalities:
- Prometheus: Time-series database and monitoring tool to collect and store metrics.
- Grafana: Visualization and dashboarding tool to display metrics in an intuitive manner.
- InfluxDB: Another time-series database that offers flexibility for storing specific types of metrics.
Each service runs on a Docker Swarm manager node and leverages NFS volumes for persistent storage.
Docker Compose Configuration
The docker-compose.yml
file forms the backbone of the stack. Below is the detailed configuration:
Prometheus Configuration
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- nfs-prometheus:/etc/prometheus
- nfs-prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=365d'
- '--web.enable-admin-api'
deploy:
mode: replicated
placement:
constraints:
- node.role == manager
replicas: 1
resources:
limits:
memory: 500M
restart_policy:
condition: on-failure
networks:
- monitoring
Grafana Configuration
grafana:
image: grafana/grafana:latest
ports:
- "3030:3000"
volumes:
- nfs-grafana:/var/lib/grafana
deploy:
mode: replicated
placement:
constraints:
- node.role == manager
replicas: 1
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=<redacted>
networks:
- monitoring
InfluxDB Configuration
influxdb:
image: influxdb:latest
ports:
- "8086:8086"
volumes:
- nfs-influxdb:/var/lib/influxdb2
deploy:
mode: replicated
placement:
constraints:
- node.role == manager
replicas: 1
environment:
- INFLUXDB_ADMIN_USER=admin
- INFLUXDB_ADMIN_PASSWORD=<redacted>
networks:
- monitoring
Overlay Network
networks:
monitoring:
driver: overlay
NFS Volumes
volumes:
nfs-influxdb:
driver: local
driver_opts:
device: :/anonymized/path/influxdb
o: addr=<nfs-server-address>,nolock,soft,rw,nfsvers=4
type: nfs
nfs-grafana:
driver: local
driver_opts:
device: :/anonymized/path/grafana
o: addr=<nfs-server-address>,nolock,soft,rw,nfsvers=4
type: nfs
nfs-prometheus:
driver: local
driver_opts:
device: :/anonymized/path/prometheus
o: addr=<nfs-server-address>,nolock,soft,rw,nfsvers=4
type: nfs
nfs-prometheus-data:
driver: local
driver_opts:
device: :/anonymized/path/prometheus/data
o: addr=<nfs-server-address>,nolock,soft,rw,nfsvers=4
type: nfs
Key Configuration Details
Prometheus
- Purpose: Central component for collecting metrics from various targets.
- Storage: Data is persisted in the
nfs-prometheus-data
volume to ensure resilience. - Retention Policy: Metrics are stored for one year (
--storage.tsdb.retention.time=365d
). - Networking: Connected to the
monitoring
overlay network for inter-service communication.
Grafana
- Purpose: Visualizes Prometheus and InfluxDB metrics.
- Storage: Dashboards and configurations are stored in
nfs-grafana
for persistence. - Security: Admin credentials are provided via environment variables and secured using Docker Swarm secrets if needed.
InfluxDB
- Purpose: Complements Prometheus by storing time-series data for specific use cases.
- Storage: Data is persisted in the
nfs-influxdb
volume. - Environment Variables: Admin credentials are set through environment variables.
Deployment Steps
Step 1: Initialize Docker Swarm
Run the following command to initialize Docker Swarm:
docker swarm init
Step 2: Deploy the Stack
Save the docker-compose.yml
file and deploy the stack:
docker stack deploy -c docker-compose.yml monitoring-stack
Step 3: Verify Services
Check the status of the services to ensure they are running correctly:
docker service ls
Step 4: Access the Interfaces
- Prometheus: Accessible at
http://<manager-node-ip>:9090
- Grafana: Accessible at
http://<manager-node-ip>:3030
- InfluxDB: Accessible at
http://<manager-node-ip>:8086
Enhancements and Best Practices
Secrets Management
Replace hardcoded passwords with Docker Swarm secrets for improved security.
Resource Limits
Ensure resource limits are configured appropriately for production environments.
Alerting
Configure alerting rules in Prometheus for proactive monitoring.
Backup Strategy
Implement periodic backups for NFS volumes to safeguard data.
Conclusion
This monitoring stack demonstrates how to leverage Docker Swarm's capabilities to deploy resilient and scalable monitoring solutions. By integrating Prometheus, Grafana, and InfluxDB, this setup provides a comprehensive toolkit for observing and maintaining your containerized applications effectively. With persistent NFS storage and an overlay network, the stack is both reliable and adaptable to diverse requirements.