Monitoring and Logging with Prometheus and Grafana

Running applications in Kubernetes requires robust monitoring and logging to ensure performance, reliability, and quick troubleshooting. Prometheus and Grafana are the most widely used tools for observability in Kubernetes. Prometheus collects and stores metrics, while Grafana provides powerful visualization and alerting capabilities.

Prometheus: Metrics Collection

Prometheus is an open-source monitoring system designed for reliability and scalability. It scrapes metrics from applications and Kubernetes components, storing them in a time-series database.

Key Features

  • Pull-based model: Prometheus scrapes metrics endpoints (usually /metrics).
  • Time-series database: Stores metrics with labels for flexible queries.
  • PromQL: A powerful query language for analyzing metrics.
  • Alertmanager: Handles alerts and integrates with email, Slack, PagerDuty, etc.

YAML Example: Prometheus Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus
        ports:
        - containerPort: 9090

Explanation: This deploys Prometheus in Kubernetes, exposing it on port 9090.

Grafana: Visualization and Dashboards

Grafana is a visualization tool that integrates with Prometheus to display metrics in interactive dashboards.

Key Features

  • Dashboards: Pre-built and custom dashboards for Kubernetes metrics.
  • Data Sources: Supports Prometheus, Loki, Elasticsearch, and more.
  • Alerting: Create alerts directly from dashboards.
  • User Management: Role-based access for teams.

YAML Example: Grafana Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana
        ports:
        - containerPort: 3000

Explanation: This deploys Grafana in Kubernetes, accessible on port 3000.

Flowchart: Monitoring Workflow


   Kubernetes components ---> Expose metrics ---> Prometheus scrapes metrics
          |
          v
   Prometheus stores metrics ---> Grafana queries Prometheus ---> Dashboards visualize data
          |
          v
   Alerts triggered ---> Alertmanager notifies teams ---> Issues resolved quickly
  

Real-Time Example

In a microservices-based e-commerce platform:

  • Prometheus: Scrapes metrics from frontend, backend, and database Pods.
  • Grafana: Displays dashboards showing request latency, error rates, and CPU usage.
  • Alertmanager: Sends alerts to Slack when error rates exceed thresholds.
  • Outcome: Teams detect issues early and maintain high availability.

Common Mistakes

  • Not deploying the Kubernetes Metrics Server, limiting observability.
  • Ignoring label design, making PromQL queries complex.
  • Overloading Grafana dashboards with too many metrics.
  • Not configuring alert thresholds properly, leading to alert fatigue.

Interview Notes

Q1: What is the role of Prometheus in Kubernetes?

Answer: Prometheus scrapes, stores, and queries metrics from Kubernetes components and applications.

Q2: How does Grafana complement Prometheus?

Answer: Grafana visualizes Prometheus metrics in dashboards and provides alerting capabilities.

Q3: What is Alertmanager?

Answer: Alertmanager is part of Prometheus that manages alerts and integrates with notification systems.

Q4: Example Interview Task

# Add Prometheus as a data source in Grafana
- URL: http://prometheus:9090
- Access: Server

Explanation: This connects Grafana to Prometheus, enabling dashboards and queries.

Advanced Notes

  • Loki Integration: Use Loki with Grafana for centralized logging.
  • ServiceMonitors: In Prometheus Operator, define how metrics are scraped.
  • Custom Metrics: Applications can expose custom metrics for business KPIs.
  • Best Practices: Use meaningful labels, configure alerts carefully, and combine metrics with logs for full observability.

Summary

Prometheus and Grafana form the backbone of monitoring and logging in Kubernetes. Prometheus collects and stores metrics, while Grafana visualizes them in dashboards and triggers alerts. Together, they provide deep observability, enabling teams to detect issues early, optimize performance, and maintain resilient applications. Mastering these tools is essential for production-grade Kubernetes deployments and a frequent topic in interviews.