Implementing Distributed Tracing with Micrometer and Zipkin
Modern microservices architectures introduce enormous operational complexity. A single client request may travel through API gateways, authentication services, business services, databases, caches, message brokers, and external APIs before a response is returned.
In monolithic applications, debugging a request failure was relatively simple because logs existed inside a single application process. In distributed systems, however, requests hop across dozens or even hundreds of services.
This creates one of the biggest challenges in cloud-native engineering:
How do you trace a request across multiple distributed microservices?
This is where distributed tracing becomes essential.
Distributed tracing allows engineers to follow an individual request as it moves across services, databases, queues, and infrastructure components. It helps identify bottlenecks, latency issues, failures, retries, and service dependencies.
In this comprehensive enterprise-grade guide, you will learn how distributed tracing works internally, how Micrometer Tracing integrates with Spring Boot, how Zipkin visualizes distributed traces, and how modern production systems implement observability at scale.
What You Will Learn
- What distributed tracing is
- Why distributed tracing is critical in microservices
- How traces and spans work internally
- What trace IDs and span IDs are
- How Micrometer Tracing works in Spring Boot
- How Zipkin collects distributed traces
- How request propagation works between services
- How tracing integrates with REST APIs and messaging systems
- How to configure tracing in Spring Boot 3
- How to debug distributed systems using tracing
- How sampling works
- Production best practices for observability
- Security and performance considerations
- How enterprises use tracing platforms
What Is Distributed Tracing?
Distributed tracing is an observability technique used to track requests across multiple distributed services.
Each incoming request receives a unique trace identifier. As the request moves between services, that trace ID propagates across the system.
Client Request | v API Gateway | v User Service | v Order Service | v Payment Service | v Notification Service
Without distributed tracing:
- Logs become disconnected
- Root cause analysis becomes difficult
- Latency bottlenecks remain hidden
- Cross-service failures are hard to diagnose
With distributed tracing:
- Every request is traceable end-to-end
- Latency breakdowns become visible
- Dependency chains become observable
- Failures can be identified quickly
Why Distributed Tracing Matters in Microservices
As systems scale, requests become increasingly complex.
Typical Enterprise Request Flow
User Request | v API Gateway | +------ Authentication Service | +------ User Service | +------ Inventory Service | +------ Order Service | +------ Payment Service | +------ Kafka Event | +------ Notification Service
A single request may involve:
- REST calls
- Database queries
- Message queues
- External APIs
- Caches
- Retries
- Circuit breakers
Without observability, debugging production failures becomes nearly impossible.
Core Concepts: Trace and Span
What Is a Trace?
A trace represents the entire lifecycle of a request across distributed systems.
Every request receives a unique Trace ID.
What Is a Span?
A span represents a single operation inside a trace.
Examples:
- HTTP request processing
- Database query
- Kafka message publishing
- External API call
Trace Hierarchy
Trace ID: abc123 Span 1: API Gateway | +---- Span 2: User Service | +---- Span 3: Order Service | +---- Span 4: Database Query | +---- Span 5: Payment API Call
Each span contains:
- Span ID
- Parent Span ID
- Operation name
- Start timestamp
- End timestamp
- Tags and metadata
- Error information
Understanding Micrometer Tracing
Spring Boot 3 introduced Micrometer Tracing as the modern replacement for Spring Cloud Sleuth.
Micrometer provides:
- Distributed tracing instrumentation
- Metrics integration
- Observation APIs
- Context propagation
- OpenTelemetry compatibility
Micrometer Tracing Architecture
Incoming Request | v Micrometer Observation API | v Trace + Span Creation | v Context Propagation | v Zipkin Exporter | v Zipkin Server
Micrometer automatically instruments Spring Boot applications with minimal configuration.
What Is Zipkin?
Zipkin is a distributed tracing system used for collecting and visualizing traces.
It provides:
- Trace visualization
- Latency analysis
- Dependency graphs
- Error tracking
- Performance bottleneck analysis
Zipkin Architecture
Microservices | v Micrometer Tracing | v Zipkin Exporter | v Zipkin Server | v Trace Storage | v Zipkin UI
Setting Up Zipkin
Run Zipkin Using Docker
docker run -d -p 9411:9411 openzipkin/zipkin
Zipkin Dashboard
After startup:
http://localhost:9411
The dashboard allows engineers to:
- Search traces
- Analyze request timelines
- Inspect spans
- Identify slow services
Spring Boot Micrometer Dependencies
Maven Dependencies
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-tracing-bridge-brave</artifactId> </dependency> <dependency> <groupId>io.zipkin.reporter2</groupId> <artifactId>zipkin-reporter-brave</artifactId> </dependency>
These dependencies enable:
- Tracing instrumentation
- Brave tracing bridge
- Zipkin integration
- Actuator observability
Configuring Micrometer Tracing
application.yml Configuration
spring: application: name: order-service management: tracing: sampling: probability: 1.0 zipkin: tracing: endpoint: http://localhost:9411/api/v2/spans endpoints: web: exposure: include: health,info,prometheus
Important Configuration Properties
| Property | Purpose |
|---|---|
| spring.application.name | Service identification |
| sampling.probability | Trace sampling rate |
| zipkin.endpoint | Zipkin collector endpoint |
Request Flow with Distributed Tracing
Client Request | v Gateway Service | | Trace ID Generated | v Order Service | | Trace ID Propagated | v Payment Service | | Trace ID Propagated | v Database Query | v Response Returned
Every service receives the same trace ID while generating unique span IDs.
Understanding Context Propagation
One of the most critical features in distributed tracing is context propagation.
Context propagation ensures:
- Trace IDs move across services
- Spans remain connected
- Request lineage remains intact
HTTP Headers Used
X-B3-TraceId X-B3-SpanId X-B3-ParentSpanId X-B3-Sampled
These headers travel automatically between services.
Sample REST Controller with Tracing
package com.example.orderservice.controller;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class OrderController {
@GetMapping("/orders")
public String getOrders() {
return "Orders Retrieved Successfully";
}
}
Micrometer automatically creates spans around HTTP requests.
Creating Custom Spans
Engineers can create manual spans for important business operations.
Custom Span Example
package com.example.orderservice.service;
import io.micrometer.tracing.Tracer;
import io.micrometer.tracing.Span;
import org.springframework.stereotype.Service;
@Service
public class PaymentService {
private final Tracer tracer;
public PaymentService(Tracer tracer) {
this.tracer = tracer;
}
public void processPayment() {
Span span = tracer.nextSpan()
.name("payment-processing");
try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
Thread.sleep(500);
} catch (Exception ex) {
span.error(ex);
} finally {
span.end();
}
}
}
Custom spans help trace critical business workflows.
Distributed Tracing with RestTemplate
Micrometer automatically propagates tracing context across HTTP clients.
RestTemplate Configuration
@Bean
public RestTemplate restTemplate() {
return new RestTemplate();
}
Service Call Example
String response = restTemplate.getForObject( "http://payment-service/payments", String.class );
Trace headers propagate automatically.
Distributed Tracing with OpenFeign
Feign clients integrate seamlessly with Micrometer tracing.
Feign Client Example
@FeignClient(name = "payment-service")
public interface PaymentClient {
@GetMapping("/payments")
String getPayments();
}
Trace context automatically flows between services.
Related topic:
Tracing Kafka and Event-Driven Systems
Distributed tracing becomes even more important in asynchronous systems.
Kafka Trace Flow
Order Service | v Publish Kafka Event | v Kafka Broker | v Inventory Service | v Notification Service
Micrometer propagates tracing metadata inside Kafka message headers.
Why This Matters
- Tracks async event chains
- Identifies slow consumers
- Debugs event processing failures
- Improves observability in event-driven architectures
Related learning:
Sampling in Distributed Tracing
Tracing every request in high-traffic systems can become extremely expensive.
Sampling controls how many traces are collected.
Sampling Example
management: tracing: sampling: probability: 0.1
This captures only 10% of requests.
Why Sampling Matters
- Reduces storage costs
- Improves performance
- Limits network overhead
- Prevents observability platform overload
Understanding Trace Visualization in Zipkin
Zipkin provides waterfall-style trace visualizations.
Trace ID: 981273981273 Gateway Service [=========] User Service [====] Order Service [=========] Payment Service [======] Database Query [==]
Engineers can instantly identify:
- Slow services
- Latency bottlenecks
- Retry storms
- Timeout chains
- Failed spans
Production Observability Architecture
+----------------------+
| Client Requests |
+----------+-----------+
|
v
+----------------------+
| API Gateway |
+----------+-----------+
|
------------------------------------------------
| | |
v v v
+-------------+ +---------------+ +----------------+
| User Svc | | Order Service | | Payment Svc |
+------+------+ +-------+-------+ +--------+-------+
| | |
--------------------------------------------
|
v
+----------------------+
| Micrometer Tracing |
+----------+-----------+
|
v
+----------------------+
| Zipkin Collector |
+----------+-----------+
|
v
+----------------------+
| Zipkin Dashboard |
+----------------------+
Debugging Production Issues Using Tracing
Example Production Incident
Users complain that checkout requests take 8 seconds.
Tracing Analysis
Gateway Service 50ms Order Service 120ms Payment Service 6500ms Database Query 6200ms
Tracing immediately reveals the bottleneck.
Without tracing, identifying this issue may take hours.
Performance Considerations
Distributed tracing introduces overhead.
Potential Costs
- Extra network traffic
- Additional storage usage
- Serialization overhead
- Context propagation costs
Optimization Strategies
- Use sampling
- Limit span metadata
- Avoid excessive custom spans
- Use asynchronous exporters
- Compress trace payloads
Security Considerations
Tracing systems can accidentally expose sensitive data.
Never Store
- Passwords
- JWT tokens
- Credit card data
- Personally identifiable information
Best Practices
- Sanitize logs and spans
- Encrypt observability traffic
- Restrict dashboard access
- Apply retention policies
- Mask confidential attributes
Micrometer Tracing vs Spring Cloud Sleuth
| Feature | Spring Cloud Sleuth | Micrometer Tracing |
|---|---|---|
| Spring Boot Version | Spring Boot 2 | Spring Boot 3 |
| Status | Deprecated | Recommended |
| OpenTelemetry Support | Limited | Native Support |
| Observation API | No | Yes |
Enterprise Best Practices
- Always propagate trace context
- Use standardized span naming
- Monitor high-latency spans
- Combine tracing with metrics and logs
- Implement centralized observability
- Use correlation IDs consistently
- Apply intelligent trace sampling
- Monitor observability platform health
Common Production Problems
Missing Trace Context
Requests lose trace IDs between services.
Broken Span Relationships
Parent-child relationships become disconnected.
Excessive Span Volume
High traffic overwhelms tracing infrastructure.
Incorrect Sampling Configuration
Critical traces may be lost accidentally.
Large Span Payloads
Excessive metadata impacts performance.
Real-World Enterprise Use Cases
E-Commerce Systems
- Checkout flow analysis
- Payment latency debugging
- Inventory synchronization tracing
Banking Platforms
- Transaction traceability
- Fraud investigation
- Audit workflows
Streaming Platforms
- Recommendation pipeline analysis
- Media delivery debugging
- Event processing visibility
Interview Questions and Answers
What is distributed tracing?
Distributed tracing tracks requests across multiple distributed services using trace IDs and spans.
What is a span?
A span represents a single operation inside a distributed trace.
What is the purpose of Zipkin?
Zipkin collects, stores, and visualizes distributed traces.
What replaced Spring Cloud Sleuth?
Micrometer Tracing replaced Spring Cloud Sleuth in Spring Boot 3.
Why is trace propagation important?
Trace propagation ensures requests remain connected across distributed services.
What is trace sampling?
Sampling limits how many traces are collected to reduce overhead and storage costs.
Frequently Asked Questions
Is distributed tracing only for microservices?
No. Even monolithic systems can benefit from tracing complex workflows.
Can Zipkin work with Kubernetes?
Yes. Zipkin is commonly deployed in Kubernetes environments.
Does tracing impact performance?
Yes, but proper sampling and optimization minimize the overhead.
Can tracing work with Kafka?
Yes. Trace context can propagate through Kafka message headers.
What is the difference between metrics and traces?
Metrics provide aggregated system measurements, while traces show individual request journeys.
Should production systems trace every request?
Usually no. Most enterprise systems use sampling strategies.
Summary
Distributed tracing is one of the most important observability techniques in modern cloud-native systems.
In this guide, you learned:
- How traces and spans work
- How Micrometer Tracing integrates with Spring Boot
- How Zipkin visualizes distributed traces
- How context propagation works
- How tracing supports REST APIs and Kafka systems
- How enterprises debug distributed systems
- Production observability best practices
Modern distributed systems cannot operate reliably without strong observability.
Distributed tracing enables:
- Faster debugging
- Better reliability
- Performance optimization
- Improved incident response
- Deep operational visibility
Mastering observability and distributed tracing is essential for backend engineers, platform engineers, SREs, and cloud-native architects.