Correlating Metrics and Logs in Grafana
An Operational SRE Guide to Configuring Data Links, Derived Fields, and Exemplar Pipelines for Context-Aware Troubleshooting.
Executive Summary & Core Concepts
In high-throughput enterprise environments, metrics and logs often exist as isolated data siloes. When an automated alert triggers due to a spike in error rates, engineers frequently lose valuable time manually copying timestamps, application names, and error IDs from a Prometheus dashboard into a separate log viewer. Correlating metrics and logs directly within Grafana removes this friction, turning disjointed data streams into an integrated troubleshooting workflow.
This correlation is achieved by anchoring both data types to a unified metadata standardâsuch as Kubernetes label namespacesâor by injecting unique transaction IDs (Trace IDs) directly into your metrics and log strings. By configuring Grafana Data Links, Derived Fields, and OpenTelemetry Exemplars, teams can click on an anomalous metric spike and instantly jump to the exact log lines or distributed traces generated by that specific event, accelerating root-cause analysis.
- Data Links: Dynamic, context-aware URL properties attached to Grafana panels that pass time ranges and label values directly into target dashboards or Split View explorer panes.
- Derived Fields: Query-time regex extraction transformations configured within the Loki data source that parse unstructured log text strings and generate internal hyperlinks.
- Exemplars: Explicit references to specific, real-world log lines or trace IDs that are embedded directly inside a Prometheus metric sample line at collection time.
- Split View Workspace: A Grafana UI mode that positions metrics on the left panel and correlating logs on the right panel, locking their timelines together for synchronized inspection.
Configuring Data Links for Metric-to-Log Transitions
Data Links allow you to bind Prometheus labels to dynamic URLs. This lets you click on a metric line inside a Grafana panel and instantly open a matching Loki log query tailored to that exact application instance.
1. The Data Link Variable Engine
When defining a data link, Grafana exposes special variables that capture the current time window and active label states of the dashboard panel:
${__series.name}: Expands to the complete string signature of the active metric series.${__label.instance}: Extracts the exact value of theinstancelabel for the clicked data point.${__from}and${__to}: Captures the precise millisecond timestamps of the dashboard's current view range.
2. Step-by-Step Dashboard Panel Provisioning
To create a context-aware link from a Prometheus metric graph to your Loki logs, apply the following configuration to your Grafana panel's **Data Links** settings:
# Title Configuration String
View Matching Production Logs
# Contextual Ingestion Link URL Blueprint
/explore?left=["now-${__from}","now-${__to}","Loki",{"expr":"{app=\\"${__label.app}\\",environment=\\"${__label.environment}\\"} |~ \\"(?i)error|fail\\""}]
When an operator clicks a metric line, this configuration automatically launches Grafana's Explore view, matches the exact app and environment labels from the metric, and filters the log streams for errors within the same time window.
Configuring Loki Derived Fields for Deep Correlation
While data links connect metric panels to log views, **Derived Fields** operate in the opposite direction. Configured at the Loki data source level, they use regular expressions to parse incoming log text strings, identify unique IDs (like a trace_id or order_id), and instantly transform them into actionable links.
Data Source Configuration Manifest
The following example shows how to configure a Derived Field inside your Grafana Loki data source YAML file (/etc/grafana/provisioning/datasources/loki.yaml) to automatically bind log lines to distributed tracing tools like Jaeger or Tempo:
# /etc/grafana/provisioning/datasources/loki.yaml
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: http://loki-querier.internal.net:3100
jsonData:
derivedFields:
- name: TraceID
# Matches standard log formats: trace_id=4f82b1a9c0f3d6a2
matcherRegex: "trace_id=(?P<value>[a-f0-8]+)"
url: "http://grafana.internal.net/explore?left=[\\"now\\",\\"now\\",\\"Tempo\\",{\\"expr\\":\\"${__value}\\"}]"
urlDisplayLabel: "â Track Distributed Trace"
Advanced Ingestion Correlation via Prometheus Exemplars
The most precise way to correlate metrics and logs is through **Exemplars**. Exemplars attach specific metadata referencesâsuch as a Trace ID or Log Signatureâdirectly to your metric samples as they are collected by Prometheus.
OpenMetrics Exemplar Format Representation:
# OpenMetrics text stream exposed by an active application endpoint
http_requests_total{method="POST",status="500"} 1425.0 # {trace_id="4f82b1a9c0f3d6a2"} 1717072632.105
|_______________________________|
^
Exemplar Payload
When Prometheus scrapes this endpoint, it stores the trace_id alongside that specific numeric sample inside its TSDB engine. Grafana visualizes these exemplars as distinct blue dots overlaid directly onto your metric lines. Hovering over or clicking an exemplar dot reveals the exact Trace ID or log signature generated at that millisecond, allowing you to instantly jump from a high-level metric chart to the corresponding log or trace.
Technical Interview Questions & Detailed Answers
Q1: Explain the functional differences between Data Links and Derived Fields within the Grafana ecosystem. When should an engineer apply each?
Answer: The core difference lies in the direction of the correlation and where the configuration is applied:
- Data Links: Configured at the **Dashboard Panel** layer. They extract metadata from metric series labels and time windows, generating external URLs or explorer links. Engineers should use Data Links to build top-down navigation paths, allowing operators to click an anomalous metric line on a high-level chart and jump directly to relevant logs or traces.
- Derived Fields: Configured at the **Data Source** layer. They use query-time regular expressions to parse raw text lines within logs, extracting values like a
trace_idortransaction_idand wrapping them in internal hyperlinks. Engineers should use Derived Fields to enable bottom-up debugging, allowing an operator who finds an error log to instantly pivot to a distributed trace or a specialized business intelligence dashboard.
Q2: Why is aligning metadata labels between Prometheus scrape configurations and Promtail log pipelines critical for maintaining dashboard query performance?
Answer: Aligning metadata labels is critical because Grafana relies on exact string substitution to pass parameters between dashboards and data sources. If your Prometheus targets use the label format {app="checkout-api", env="production"} but your Promtail configurations map the same service as {service="checkout", environment="prod"}, automated data links will break or generate empty queries.
Furthermore, maintaining matching label sets ensures that Loki can use its index to quickly narrow down log streams. If your data link queries Loki using clean, indexed metadata labels (like app and env), Loki can locate the exact log blocks on disk instantly. If mismatched labels force your data links to rely on heavy text filters (like |= "checkout-api") across unindexed streams, it degrades query performance, burdens cluster resources, and slows down dashboard responses.
Summary
Correlating metrics and logs in Grafana bridges the gap between high-level system alerts and deep, contextual log data. By leverage Data Links, Derived Fields, and Exemplars, platform teams can design an integrated observability workflow where dashboards safely hand off metadata context across data sources. This deep integration simplifies debugging, eliminates manual data searching during incidents, and enables engineering teams to root-cause production anomalies with minimal friction.