Building a Monitoring Stack on Docker Swarm with Prometheus, Grafana, and InfluxDB

Monitoring environments is critical for maintaining the performance and reliability of modern applications. This blog post details the setup of a monitoring stack in Docker Swarm using Prometheus, Grafana, and InfluxDB. We'll focus on a practical example using NFS for shared storage...

· 3 min read
Building a Monitoring Stack on Docker Swarm with Prometheus, Grafana, and InfluxDB

Monitoring containerized (or not) environments is critical for maintaining the performance and reliability of modern applications. This blog post details the setup of a monitoring stack in Docker Swarm using Prometheus, Grafana, and InfluxDB. We'll focus on a practical example using NFS for shared storage, an overlay network for communication, and Swarm's orchestration capabilities. For privacy, paths and passwords have been anonymized.

Overview of the Stack

This stack provides the following functionalities:

  • Prometheus: Time-series database and monitoring tool to collect and store metrics.
  • Grafana: Visualization and dashboarding tool to display metrics in an intuitive manner.
  • InfluxDB: Another time-series database that offers flexibility for storing specific types of metrics.

Each service runs on a Docker Swarm manager node and leverages NFS volumes for persistent storage.


Docker Compose Configuration

The docker-compose.yml file forms the backbone of the stack. Below is the detailed configuration:

Prometheus Configuration

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - nfs-prometheus:/etc/prometheus
      - nfs-prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=365d'
      - '--web.enable-admin-api'
    deploy:
      mode: replicated
      placement:
        constraints:
          - node.role == manager
      replicas: 1
      resources:
        limits:
          memory: 500M
      restart_policy:
        condition: on-failure
    networks:
      - monitoring

Grafana Configuration

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3030:3000"
    volumes:
      - nfs-grafana:/var/lib/grafana
    deploy:
      mode: replicated
      placement:
        constraints:
          - node.role == manager   
      replicas: 1
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=<redacted>
    networks:
      - monitoring

InfluxDB Configuration

  influxdb:
    image: influxdb:latest
    ports:
      - "8086:8086"
    volumes:
      - nfs-influxdb:/var/lib/influxdb2
    deploy:
      mode: replicated
      placement:
        constraints:
          - node.role == manager    
      replicas: 1
    environment:
      - INFLUXDB_ADMIN_USER=admin
      - INFLUXDB_ADMIN_PASSWORD=<redacted>
    networks:
      - monitoring

Overlay Network

networks:
  monitoring:
    driver: overlay

NFS Volumes

volumes:
  nfs-influxdb:
    driver: local
    driver_opts:
      device: :/anonymized/path/influxdb
      o: addr=<nfs-server-address>,nolock,soft,rw,nfsvers=4
      type: nfs
  nfs-grafana:
    driver: local
    driver_opts:
      device: :/anonymized/path/grafana
      o: addr=<nfs-server-address>,nolock,soft,rw,nfsvers=4
      type: nfs
  nfs-prometheus:
    driver: local
    driver_opts:
      device: :/anonymized/path/prometheus
      o: addr=<nfs-server-address>,nolock,soft,rw,nfsvers=4
      type: nfs
  nfs-prometheus-data:
    driver: local
    driver_opts:
      device: :/anonymized/path/prometheus/data
      o: addr=<nfs-server-address>,nolock,soft,rw,nfsvers=4
      type: nfs

Key Configuration Details

Prometheus

  • Purpose: Central component for collecting metrics from various targets.
  • Storage: Data is persisted in the nfs-prometheus-data volume to ensure resilience.
  • Retention Policy: Metrics are stored for one year (--storage.tsdb.retention.time=365d).
  • Networking: Connected to the monitoring overlay network for inter-service communication.

Grafana

  • Purpose: Visualizes Prometheus and InfluxDB metrics.
  • Storage: Dashboards and configurations are stored in nfs-grafana for persistence.
  • Security: Admin credentials are provided via environment variables and secured using Docker Swarm secrets if needed.

InfluxDB

  • Purpose: Complements Prometheus by storing time-series data for specific use cases.
  • Storage: Data is persisted in the nfs-influxdb volume.
  • Environment Variables: Admin credentials are set through environment variables.

Deployment Steps

Step 1: Initialize Docker Swarm

Run the following command to initialize Docker Swarm:

docker swarm init

Step 2: Deploy the Stack

Save the docker-compose.yml file and deploy the stack:

docker stack deploy -c docker-compose.yml monitoring-stack

Step 3: Verify Services

Check the status of the services to ensure they are running correctly:

docker service ls

Step 4: Access the Interfaces

  • Prometheus: Accessible at http://<manager-node-ip>:9090
  • Grafana: Accessible at http://<manager-node-ip>:3030
  • InfluxDB: Accessible at http://<manager-node-ip>:8086

Enhancements and Best Practices

Secrets Management

Replace hardcoded passwords with Docker Swarm secrets for improved security.

Resource Limits

Ensure resource limits are configured appropriately for production environments.

Alerting

Configure alerting rules in Prometheus for proactive monitoring.

Backup Strategy

Implement periodic backups for NFS volumes to safeguard data.


Conclusion

This monitoring stack demonstrates how to leverage Docker Swarm's capabilities to deploy resilient and scalable monitoring solutions. By integrating Prometheus, Grafana, and InfluxDB, this setup provides a comprehensive toolkit for observing and maintaining your containerized applications effectively. With persistent NFS storage and an overlay network, the stack is both reliable and adaptable to diverse requirements.