Published: 2026-06-01 โ€ข Updated: 2026-06-20

Implementing Distributed Tracing with Micrometer and Zipkin

Modern microservices architectures introduce enormous operational complexity. A single client request may travel through API gateways, authentication services, business services, databases, caches, message brokers, and external APIs before a response is returned.

In monolithic applications, debugging a request failure was relatively simple because logs existed inside a single application process. In distributed systems, however, requests hop across dozens or even hundreds of services.

This creates one of the biggest challenges in cloud-native engineering:

How do you trace a request across multiple distributed microservices?

This is where distributed tracing becomes essential.

Distributed tracing allows engineers to follow an individual request as it moves across services, databases, queues, and infrastructure components. It helps identify bottlenecks, latency issues, failures, retries, and service dependencies.

In this comprehensive enterprise-grade guide, you will learn how distributed tracing works internally, how Micrometer Tracing integrates with Spring Boot, how Zipkin visualizes distributed traces, and how modern production systems implement observability at scale.


What You Will Learn

  • What distributed tracing is
  • Why distributed tracing is critical in microservices
  • How traces and spans work internally
  • What trace IDs and span IDs are
  • How Micrometer Tracing works in Spring Boot
  • How Zipkin collects distributed traces
  • How request propagation works between services
  • How tracing integrates with REST APIs and messaging systems
  • How to configure tracing in Spring Boot 3
  • How to debug distributed systems using tracing
  • How sampling works
  • Production best practices for observability
  • Security and performance considerations
  • How enterprises use tracing platforms

What Is Distributed Tracing?

Distributed tracing is an observability technique used to track requests across multiple distributed services.

Each incoming request receives a unique trace identifier. As the request moves between services, that trace ID propagates across the system.


Client Request
|
v
API Gateway
|
v
User Service
|
v
Order Service
|
v
Payment Service
|
v
Notification Service 

Without distributed tracing:

  • Logs become disconnected
  • Root cause analysis becomes difficult
  • Latency bottlenecks remain hidden
  • Cross-service failures are hard to diagnose

With distributed tracing:

  • Every request is traceable end-to-end
  • Latency breakdowns become visible
  • Dependency chains become observable
  • Failures can be identified quickly

Why Distributed Tracing Matters in Microservices

As systems scale, requests become increasingly complex.

Typical Enterprise Request Flow


User Request
|
v
API Gateway
|
+------ Authentication Service
|
+------ User Service
|
+------ Inventory Service
|
+------ Order Service
|
+------ Payment Service
|
+------ Kafka Event
|
+------ Notification Service 

A single request may involve:

  • REST calls
  • Database queries
  • Message queues
  • External APIs
  • Caches
  • Retries
  • Circuit breakers

Without observability, debugging production failures becomes nearly impossible.

Core Concepts: Trace and Span

What Is a Trace?

A trace represents the entire lifecycle of a request across distributed systems.

Every request receives a unique Trace ID.

What Is a Span?

A span represents a single operation inside a trace.

Examples:

  • HTTP request processing
  • Database query
  • Kafka message publishing
  • External API call

Trace Hierarchy


Trace ID: abc123

Span 1: API Gateway
|
+---- Span 2: User Service
|
+---- Span 3: Order Service
|
+---- Span 4: Database Query
|
+---- Span 5: Payment API Call 

Each span contains:

  • Span ID
  • Parent Span ID
  • Operation name
  • Start timestamp
  • End timestamp
  • Tags and metadata
  • Error information

Understanding Micrometer Tracing

Spring Boot 3 introduced Micrometer Tracing as the modern replacement for Spring Cloud Sleuth.

Micrometer provides:

  • Distributed tracing instrumentation
  • Metrics integration
  • Observation APIs
  • Context propagation
  • OpenTelemetry compatibility

Micrometer Tracing Architecture


Incoming Request
|
v
Micrometer Observation API
|
v
Trace + Span Creation
|
v
Context Propagation
|
v
Zipkin Exporter
|
v
Zipkin Server 

Micrometer automatically instruments Spring Boot applications with minimal configuration.

What Is Zipkin?

Zipkin is a distributed tracing system used for collecting and visualizing traces.

It provides:

  • Trace visualization
  • Latency analysis
  • Dependency graphs
  • Error tracking
  • Performance bottleneck analysis

Zipkin Architecture


Microservices
|
v
Micrometer Tracing
|
v
Zipkin Exporter
|
v
Zipkin Server
|
v
Trace Storage
|
v
Zipkin UI 

Setting Up Zipkin

Run Zipkin Using Docker


docker run -d -p 9411:9411 openzipkin/zipkin 

Zipkin Dashboard

After startup:


http://localhost:9411 

The dashboard allows engineers to:

  • Search traces
  • Analyze request timelines
  • Inspect spans
  • Identify slow services

Spring Boot Micrometer Dependencies

Maven Dependencies



<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>

<dependency>
    <groupId>io.zipkin.reporter2</groupId>
    <artifactId>zipkin-reporter-brave</artifactId>
</dependency>


These dependencies enable:

  • Tracing instrumentation
  • Brave tracing bridge
  • Zipkin integration
  • Actuator observability

Configuring Micrometer Tracing

application.yml Configuration


spring:
application:
name: order-service

management:
tracing:
sampling:
probability: 1.0

zipkin:
tracing:
endpoint: http://localhost:9411/api/v2/spans

endpoints:
web:
exposure:
include: health,info,prometheus 

Important Configuration Properties

Property Purpose
spring.application.name Service identification
sampling.probability Trace sampling rate
zipkin.endpoint Zipkin collector endpoint

Request Flow with Distributed Tracing


Client Request
|
v
Gateway Service
|
| Trace ID Generated
|
v
Order Service
|
| Trace ID Propagated
|
v
Payment Service
|
| Trace ID Propagated
|
v
Database Query
|
v
Response Returned 

Every service receives the same trace ID while generating unique span IDs.

Understanding Context Propagation

One of the most critical features in distributed tracing is context propagation.

Context propagation ensures:

  • Trace IDs move across services
  • Spans remain connected
  • Request lineage remains intact

HTTP Headers Used


X-B3-TraceId
X-B3-SpanId
X-B3-ParentSpanId
X-B3-Sampled 

These headers travel automatically between services.

Sample REST Controller with Tracing


package com.example.orderservice.controller;

import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class OrderController {

@GetMapping("/orders")
public String getOrders() {

    return "Orders Retrieved Successfully";
}

} 

Micrometer automatically creates spans around HTTP requests.

Creating Custom Spans

Engineers can create manual spans for important business operations.

Custom Span Example


package com.example.orderservice.service;

import io.micrometer.tracing.Tracer;
import io.micrometer.tracing.Span;

import org.springframework.stereotype.Service;

@Service
public class PaymentService {

private final Tracer tracer;

public PaymentService(Tracer tracer) {
    this.tracer = tracer;
}

public void processPayment() {

    Span span = tracer.nextSpan()
            .name("payment-processing");

    try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {

        Thread.sleep(500);

    } catch (Exception ex) {

        span.error(ex);

    } finally {

        span.end();
    }
}

} 

Custom spans help trace critical business workflows.

Distributed Tracing with RestTemplate

Micrometer automatically propagates tracing context across HTTP clients.

RestTemplate Configuration


@Bean
public RestTemplate restTemplate() {
return new RestTemplate();
} 

Service Call Example


String response = restTemplate.getForObject(
"http://payment-service/payments",
String.class
); 

Trace headers propagate automatically.

Distributed Tracing with OpenFeign

Feign clients integrate seamlessly with Micrometer tracing.

Feign Client Example


@FeignClient(name = "payment-service")
public interface PaymentClient {

@GetMapping("/payments")
String getPayments();

} 

Trace context automatically flows between services.

Related topic:

Client-Side Load Balancing with Spring Cloud LoadBalancer

Tracing Kafka and Event-Driven Systems

Distributed tracing becomes even more important in asynchronous systems.

Kafka Trace Flow


Order Service
|
v
Publish Kafka Event
|
v
Kafka Broker
|
v
Inventory Service
|
v
Notification Service 

Micrometer propagates tracing metadata inside Kafka message headers.

Why This Matters

  • Tracks async event chains
  • Identifies slow consumers
  • Debugs event processing failures
  • Improves observability in event-driven architectures

Related learning:

Event-Driven Microservices with Spring Cloud Stream

Sampling in Distributed Tracing

Tracing every request in high-traffic systems can become extremely expensive.

Sampling controls how many traces are collected.

Sampling Example


management:
tracing:
sampling:
probability: 0.1 

This captures only 10% of requests.

Why Sampling Matters

  • Reduces storage costs
  • Improves performance
  • Limits network overhead
  • Prevents observability platform overload

Understanding Trace Visualization in Zipkin

Zipkin provides waterfall-style trace visualizations.


Trace ID: 981273981273

Gateway Service      [=========]
User Service              [====]
Order Service                 [=========]
Payment Service                   [======]
Database Query                      [==] 

Engineers can instantly identify:

  • Slow services
  • Latency bottlenecks
  • Retry storms
  • Timeout chains
  • Failed spans

Production Observability Architecture

            +----------------------+
            |    Client Requests   |
            +----------+-----------+
                       |
                       v
            +----------------------+
            |  API Gateway         |
            +----------+-----------+
                       |
  ------------------------------------------------
  |                    |                         |
  v                    v                         v

+-------------+   +---------------+      +----------------+
| User Svc    |   | Order Service |      | Payment Svc    |
+------+------+   +-------+-------+      +--------+-------+
|                  |                       |
--------------------------------------------
|
v
+----------------------+
| Micrometer Tracing   |
+----------+-----------+
|
v
+----------------------+
| Zipkin Collector     |
+----------+-----------+
|
v
+----------------------+
| Zipkin Dashboard     |
+----------------------+ 

Debugging Production Issues Using Tracing

Example Production Incident

Users complain that checkout requests take 8 seconds.

Tracing Analysis


Gateway Service       50ms
Order Service        120ms
Payment Service     6500ms
Database Query      6200ms 

Tracing immediately reveals the bottleneck.

Without tracing, identifying this issue may take hours.

Performance Considerations

Distributed tracing introduces overhead.

Potential Costs

  • Extra network traffic
  • Additional storage usage
  • Serialization overhead
  • Context propagation costs

Optimization Strategies

  • Use sampling
  • Limit span metadata
  • Avoid excessive custom spans
  • Use asynchronous exporters
  • Compress trace payloads

Security Considerations

Tracing systems can accidentally expose sensitive data.

Never Store

  • Passwords
  • JWT tokens
  • Credit card data
  • Personally identifiable information

Best Practices

  • Sanitize logs and spans
  • Encrypt observability traffic
  • Restrict dashboard access
  • Apply retention policies
  • Mask confidential attributes

Micrometer Tracing vs Spring Cloud Sleuth

Feature Spring Cloud Sleuth Micrometer Tracing
Spring Boot Version Spring Boot 2 Spring Boot 3
Status Deprecated Recommended
OpenTelemetry Support Limited Native Support
Observation API No Yes

Enterprise Best Practices

  • Always propagate trace context
  • Use standardized span naming
  • Monitor high-latency spans
  • Combine tracing with metrics and logs
  • Implement centralized observability
  • Use correlation IDs consistently
  • Apply intelligent trace sampling
  • Monitor observability platform health

Common Production Problems

Missing Trace Context

Requests lose trace IDs between services.

Broken Span Relationships

Parent-child relationships become disconnected.

Excessive Span Volume

High traffic overwhelms tracing infrastructure.

Incorrect Sampling Configuration

Critical traces may be lost accidentally.

Large Span Payloads

Excessive metadata impacts performance.

Real-World Enterprise Use Cases

E-Commerce Systems

  • Checkout flow analysis
  • Payment latency debugging
  • Inventory synchronization tracing

Banking Platforms

  • Transaction traceability
  • Fraud investigation
  • Audit workflows

Streaming Platforms

  • Recommendation pipeline analysis
  • Media delivery debugging
  • Event processing visibility

Interview Questions and Answers

What is distributed tracing?

Distributed tracing tracks requests across multiple distributed services using trace IDs and spans.

What is a span?

A span represents a single operation inside a distributed trace.

What is the purpose of Zipkin?

Zipkin collects, stores, and visualizes distributed traces.

What replaced Spring Cloud Sleuth?

Micrometer Tracing replaced Spring Cloud Sleuth in Spring Boot 3.

Why is trace propagation important?

Trace propagation ensures requests remain connected across distributed services.

What is trace sampling?

Sampling limits how many traces are collected to reduce overhead and storage costs.

Frequently Asked Questions

Is distributed tracing only for microservices?

No. Even monolithic systems can benefit from tracing complex workflows.

Can Zipkin work with Kubernetes?

Yes. Zipkin is commonly deployed in Kubernetes environments.

Does tracing impact performance?

Yes, but proper sampling and optimization minimize the overhead.

Can tracing work with Kafka?

Yes. Trace context can propagate through Kafka message headers.

What is the difference between metrics and traces?

Metrics provide aggregated system measurements, while traces show individual request journeys.

Should production systems trace every request?

Usually no. Most enterprise systems use sampling strategies.

Summary

Distributed tracing is one of the most important observability techniques in modern cloud-native systems.

In this guide, you learned:

  • How traces and spans work
  • How Micrometer Tracing integrates with Spring Boot
  • How Zipkin visualizes distributed traces
  • How context propagation works
  • How tracing supports REST APIs and Kafka systems
  • How enterprises debug distributed systems
  • Production observability best practices

Modern distributed systems cannot operate reliably without strong observability.

Distributed tracing enables:

  • Faster debugging
  • Better reliability
  • Performance optimization
  • Improved incident response
  • Deep operational visibility

Mastering observability and distributed tracing is essential for backend engineers, platform engineers, SREs, and cloud-native architects.

Next Learning Recommendations

About the Author

Naresh Kumar

Naresh Kumar

Senior Java Backend Engineer experienced in Banking, Payments, ISO 20022, Spring Boot, Microservices, Kafka, Docker, Kubernetes, AWS and Cloud Native Systems.

Built enterprise payment solutions, transaction processing systems, API platforms and scalable microservices used in production.

LinkedIn Profile