Implementing Distributed Tracing with Micrometer and Zipkin

Modern microservices architectures introduce enormous operational complexity. A single client request may travel through API gateways, authentication services, business services, databases, caches, message brokers, and external APIs before a response is returned.

In monolithic applications, debugging a request failure was relatively simple because logs existed inside a single application process. In distributed systems, however, requests hop across dozens or even hundreds of services.

This creates one of the biggest challenges in cloud-native engineering:

How do you trace a request across multiple distributed microservices?

This is where distributed tracing becomes essential.

Distributed tracing allows engineers to follow an individual request as it moves across services, databases, queues, and infrastructure components. It helps identify bottlenecks, latency issues, failures, retries, and service dependencies.

In this comprehensive enterprise-grade guide, you will learn how distributed tracing works internally, how Micrometer Tracing integrates with Spring Boot, how Zipkin visualizes distributed traces, and how modern production systems implement observability at scale.

What You Will Learn

What distributed tracing is
Why distributed tracing is critical in microservices
How traces and spans work internally
What trace IDs and span IDs are
How Micrometer Tracing works in Spring Boot
How Zipkin collects distributed traces
How request propagation works between services
How tracing integrates with REST APIs and messaging systems
How to configure tracing in Spring Boot 3
How to debug distributed systems using tracing
How sampling works
Production best practices for observability
Security and performance considerations
How enterprises use tracing platforms

What Is Distributed Tracing?

Distributed tracing is an observability technique used to track requests across multiple distributed services.

Each incoming request receives a unique trace identifier. As the request moves between services, that trace ID propagates across the system.


Client Request
|
v
API Gateway
|
v
User Service
|
v
Order Service
|
v
Payment Service
|
v
Notification Service

Without distributed tracing:

Logs become disconnected
Root cause analysis becomes difficult
Latency bottlenecks remain hidden
Cross-service failures are hard to diagnose

With distributed tracing:

Every request is traceable end-to-end
Latency breakdowns become visible
Dependency chains become observable
Failures can be identified quickly

Why Distributed Tracing Matters in Microservices

As systems scale, requests become increasingly complex.

Typical Enterprise Request Flow


User Request
|
v
API Gateway
|
+------ Authentication Service
|
+------ User Service
|
+------ Inventory Service
|
+------ Order Service
|
+------ Payment Service
|
+------ Kafka Event
|
+------ Notification Service

A single request may involve:

REST calls
Database queries
Message queues
External APIs
Caches
Retries
Circuit breakers

Without observability, debugging production failures becomes nearly impossible.

Core Concepts: Trace and Span

What Is a Trace?

A trace represents the entire lifecycle of a request across distributed systems.

Every request receives a unique Trace ID.

What Is a Span?

A span represents a single operation inside a trace.

Examples:

HTTP request processing
Database query
Kafka message publishing
External API call

Trace Hierarchy


Trace ID: abc123

Span 1: API Gateway
|
+---- Span 2: User Service
|
+---- Span 3: Order Service
|
+---- Span 4: Database Query
|
+---- Span 5: Payment API Call

Each span contains:

Span ID
Parent Span ID
Operation name
Start timestamp
End timestamp
Tags and metadata
Error information

Understanding Micrometer Tracing

Spring Boot 3 introduced Micrometer Tracing as the modern replacement for Spring Cloud Sleuth.

Micrometer provides:

Distributed tracing instrumentation
Metrics integration
Observation APIs
Context propagation
OpenTelemetry compatibility

Micrometer Tracing Architecture


Incoming Request
|
v
Micrometer Observation API
|
v
Trace + Span Creation
|
v
Context Propagation
|
v
Zipkin Exporter
|
v
Zipkin Server

Micrometer automatically instruments Spring Boot applications with minimal configuration.

What Is Zipkin?

Zipkin is a distributed tracing system used for collecting and visualizing traces.

It provides:

Trace visualization
Latency analysis
Dependency graphs
Error tracking
Performance bottleneck analysis

Zipkin Architecture


Microservices
|
v
Micrometer Tracing
|
v
Zipkin Exporter
|
v
Zipkin Server
|
v
Trace Storage
|
v
Zipkin UI

Setting Up Zipkin

Run Zipkin Using Docker


docker run -d -p 9411:9411 openzipkin/zipkin

Zipkin Dashboard

After startup:


http://localhost:9411

The dashboard allows engineers to:

Search traces
Analyze request timelines
Inspect spans
Identify slow services

Spring Boot Micrometer Dependencies

Maven Dependencies



<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>

<dependency>
    <groupId>io.zipkin.reporter2</groupId>
    <artifactId>zipkin-reporter-brave</artifactId>
</dependency>

These dependencies enable:

Tracing instrumentation
Brave tracing bridge
Zipkin integration
Actuator observability

Configuring Micrometer Tracing

application.yml Configuration


spring:
application:
name: order-service

management:
tracing:
sampling:
probability: 1.0

zipkin:
tracing:
endpoint: http://localhost:9411/api/v2/spans

endpoints:
web:
exposure:
include: health,info,prometheus

Important Configuration Properties

Property	Purpose
spring.application.name	Service identification
sampling.probability	Trace sampling rate
zipkin.endpoint	Zipkin collector endpoint

Request Flow with Distributed Tracing


Client Request
|
v
Gateway Service
|
| Trace ID Generated
|
v
Order Service
|
| Trace ID Propagated
|
v
Payment Service
|
| Trace ID Propagated
|
v
Database Query
|
v
Response Returned

Every service receives the same trace ID while generating unique span IDs.

Understanding Context Propagation

One of the most critical features in distributed tracing is context propagation.

Context propagation ensures:

Trace IDs move across services
Spans remain connected
Request lineage remains intact

HTTP Headers Used


X-B3-TraceId
X-B3-SpanId
X-B3-ParentSpanId
X-B3-Sampled

These headers travel automatically between services.

Sample REST Controller with Tracing


package com.example.orderservice.controller;

import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class OrderController {

@GetMapping("/orders")
public String getOrders() {

    return "Orders Retrieved Successfully";
}

}

Micrometer automatically creates spans around HTTP requests.

Creating Custom Spans

Engineers can create manual spans for important business operations.

Custom Span Example


package com.example.orderservice.service;

import io.micrometer.tracing.Tracer;
import io.micrometer.tracing.Span;

import org.springframework.stereotype.Service;

@Service
public class PaymentService {

private final Tracer tracer;

public PaymentService(Tracer tracer) {
    this.tracer = tracer;
}

public void processPayment() {

    Span span = tracer.nextSpan()
            .name("payment-processing");

    try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {

        Thread.sleep(500);

    } catch (Exception ex) {

        span.error(ex);

    } finally {

        span.end();
    }
}

}

Custom spans help trace critical business workflows.

Distributed Tracing with RestTemplate

Micrometer automatically propagates tracing context across HTTP clients.

RestTemplate Configuration


@Bean
public RestTemplate restTemplate() {
return new RestTemplate();
}

Service Call Example


String response = restTemplate.getForObject(
"http://payment-service/payments",
String.class
);

Trace headers propagate automatically.

Distributed Tracing with OpenFeign

Feign clients integrate seamlessly with Micrometer tracing.

Feign Client Example


@FeignClient(name = "payment-service")
public interface PaymentClient {

@GetMapping("/payments")
String getPayments();

}

Trace context automatically flows between services.

Tracing Kafka and Event-Driven Systems

Distributed tracing becomes even more important in asynchronous systems.

Kafka Trace Flow


Order Service
|
v
Publish Kafka Event
|
v
Kafka Broker
|
v
Inventory Service
|
v
Notification Service

Micrometer propagates tracing metadata inside Kafka message headers.

Why This Matters

Tracks async event chains
Identifies slow consumers
Debugs event processing failures
Improves observability in event-driven architectures

Related learning:

Event-Driven Microservices with Spring Cloud Stream

Sampling in Distributed Tracing

Tracing every request in high-traffic systems can become extremely expensive.

Sampling controls how many traces are collected.

Sampling Example


management:
tracing:
sampling:
probability: 0.1

This captures only 10% of requests.

Why Sampling Matters

Reduces storage costs
Improves performance
Limits network overhead
Prevents observability platform overload

Understanding Trace Visualization in Zipkin

Zipkin provides waterfall-style trace visualizations.


Trace ID: 981273981273

Gateway Service      [=========]
User Service              [====]
Order Service                 [=========]
Payment Service                   [======]
Database Query                      [==]

Engineers can instantly identify:

Slow services
Latency bottlenecks
Retry storms
Timeout chains
Failed spans

Production Observability Architecture

            +----------------------+
            |    Client Requests   |
            +----------+-----------+
                       |
                       v
            +----------------------+
            |  API Gateway         |
            +----------+-----------+
                       |
  ------------------------------------------------
  |                    |                         |
  v                    v                         v

+-------------+   +---------------+      +----------------+
| User Svc    |   | Order Service |      | Payment Svc    |
+------+------+   +-------+-------+      +--------+-------+
|                  |                       |
--------------------------------------------
|
v
+----------------------+
| Micrometer Tracing   |
+----------+-----------+
|
v
+----------------------+
| Zipkin Collector     |
+----------+-----------+
|
v
+----------------------+
| Zipkin Dashboard     |
+----------------------+

Debugging Production Issues Using Tracing

Example Production Incident

Users complain that checkout requests take 8 seconds.

Tracing Analysis


Gateway Service       50ms
Order Service        120ms
Payment Service     6500ms
Database Query      6200ms

Tracing immediately reveals the bottleneck.

Without tracing, identifying this issue may take hours.

Performance Considerations

Distributed tracing introduces overhead.

Potential Costs

Extra network traffic
Additional storage usage
Serialization overhead
Context propagation costs

Optimization Strategies

Use sampling
Limit span metadata
Avoid excessive custom spans
Use asynchronous exporters
Compress trace payloads

Security Considerations

Tracing systems can accidentally expose sensitive data.

Never Store

Passwords
JWT tokens
Credit card data
Personally identifiable information

Best Practices

Sanitize logs and spans
Encrypt observability traffic
Restrict dashboard access
Apply retention policies
Mask confidential attributes

Micrometer Tracing vs Spring Cloud Sleuth

Feature	Spring Cloud Sleuth	Micrometer Tracing
Spring Boot Version	Spring Boot 2	Spring Boot 3
Status	Deprecated	Recommended
OpenTelemetry Support	Limited	Native Support
Observation API	No	Yes

Enterprise Best Practices

Always propagate trace context
Use standardized span naming
Monitor high-latency spans
Combine tracing with metrics and logs
Implement centralized observability
Use correlation IDs consistently
Apply intelligent trace sampling
Monitor observability platform health

Common Production Problems

Missing Trace Context

Requests lose trace IDs between services.

Broken Span Relationships

Parent-child relationships become disconnected.

Excessive Span Volume

High traffic overwhelms tracing infrastructure.

Incorrect Sampling Configuration

Critical traces may be lost accidentally.

Large Span Payloads

Excessive metadata impacts performance.

Real-World Enterprise Use Cases

E-Commerce Systems

Checkout flow analysis
Payment latency debugging
Inventory synchronization tracing

Banking Platforms

Transaction traceability
Fraud investigation
Audit workflows

Streaming Platforms

Recommendation pipeline analysis
Media delivery debugging
Event processing visibility

Interview Questions and Answers

What is distributed tracing?

Distributed tracing tracks requests across multiple distributed services using trace IDs and spans.

What is a span?

A span represents a single operation inside a distributed trace.

What is the purpose of Zipkin?

Zipkin collects, stores, and visualizes distributed traces.

What replaced Spring Cloud Sleuth?

Micrometer Tracing replaced Spring Cloud Sleuth in Spring Boot 3.

Why is trace propagation important?

Trace propagation ensures requests remain connected across distributed services.

What is trace sampling?

Sampling limits how many traces are collected to reduce overhead and storage costs.

Frequently Asked Questions

Is distributed tracing only for microservices?

No. Even monolithic systems can benefit from tracing complex workflows.

Can Zipkin work with Kubernetes?

Yes. Zipkin is commonly deployed in Kubernetes environments.

Does tracing impact performance?

Yes, but proper sampling and optimization minimize the overhead.

Can tracing work with Kafka?

Yes. Trace context can propagate through Kafka message headers.

What is the difference between metrics and traces?

Metrics provide aggregated system measurements, while traces show individual request journeys.

Should production systems trace every request?

Usually no. Most enterprise systems use sampling strategies.

Summary

Distributed tracing is one of the most important observability techniques in modern cloud-native systems.

In this guide, you learned:

How traces and spans work
How Micrometer Tracing integrates with Spring Boot
How Zipkin visualizes distributed traces
How context propagation works
How tracing supports REST APIs and Kafka systems
How enterprises debug distributed systems
Production observability best practices

Modern distributed systems cannot operate reliably without strong observability.

Distributed tracing enables:

Faster debugging
Better reliability
Performance optimization
Improved incident response
Deep operational visibility

Mastering observability and distributed tracing is essential for backend engineers, platform engineers, SREs, and cloud-native architects.