Published: 2026-06-01 โ€ข Updated: 2026-06-20

Implementing Client-Side Load Balancing with Spring Cloud LoadBalancer

An enterprise-grade, deep-dive guide to designing, configuring, and scaling resilient client-side load balancing in modern microservices architectures using Spring Cloud LoadBalancer, WebClient, and Service Discovery.

Table of Contents

What is Client-Side Load Balancing?

Featured Snippet Definition: Client-Side Load Balancing is a distributed architectural pattern where the calling service (the client) is responsible for choosing which instance of the target service to send a request to. Instead of routing traffic through a centralized, middle-tier hardware or software proxy (like NGINX, HAProxy, or an AWS Application Load Balancer), the client queries a service registry (such as Eureka or Consul) to obtain a list of healthy service instances, caches this list locally, and applies a local routing algorithm (e.g., Round Robin, Random, or Weighted) to dispatch the request directly to an upstream instance.

This decentralized approach eliminates single points of failure, reduces network latency by bypassing middle-tier proxies, and gives clients fine-grained control over routing decisions based on local context, network conditions, or custom business logic.

What You Will Learn

  • The core architectural differences between server-side and client-side load balancing.
  • Why Spring Cloud LoadBalancer (SCLB) is the modern, reactive successor to Netflix Ribbon.
  • The internal mechanics of SCLB, including its caching, reactive stream integrations, and discovery client bindings.
  • How to implement SCLB in Spring Boot 3.x using both blocking (RestTemplate) and non-blocking (WebClient) HTTP clients.
  • How to write custom, production-grade routing algorithms, such as same-zone affinity and weighted random routing.
  • How to configure advanced enterprise resilience patterns like retry mechanisms and circuit breakers.
  • How to monitor, debug, and troubleshoot SCLB in high-traffic enterprise environments.

Prerequisites

To fully benefit from this guide, you should have a solid foundation in the following areas:

  • Java Ecosystem: Java 17 or 21 (LTS versions) and Spring Boot 3.x.
  • Service Discovery: Familiarity with Spring Cloud Discovery concepts (specifically Netflix Eureka or HashiCorp Consul). If you need a refresher, refer to our guide on Service Discovery with Netflix Eureka.
  • Reactive Programming: Basic understanding of Project Reactor (Mono, Flux) is highly recommended, as Spring Cloud LoadBalancer is natively reactive.
  • Build Tools: Apache Maven or Gradle for dependency management.

Architectural Deep Dive: Server-Side vs. Client-Side Load Balancing

Understanding when and why to use client-side load balancing requires analyzing how it compares to traditional server-side load balancing. Both patterns aim to distribute traffic across redundant servers, but they do so at different points in the network topology.

1. Server-Side Load Balancing

In a server-side load-balancing architecture, all incoming client requests are sent to a centralized load balancer proxy. This proxy manages the state of the backend instances, checks their health, and forwards the request to an appropriate instance. The client has zero visibility into the topology of the backend service.

+--------------+               +---------------+               +------------------+
|              |  HTTP Request |  Centralized  |  Route Req    | Service Instance |
|  API Client  | ------------> | Server-Side   | ------------> |      (Node A)    |
|              |               | Load Balancer |               +------------------+
+--------------+               | (e.g., NGINX, |               +------------------+
                               |  AWS ALB)     |  Route Req    | Service Instance |
                               +---------------+ ------------> |      (Node B)    |
                                                               +------------------+
    

Advantages:

  • Simple Clients: Clients require no special configuration, libraries, or knowledge of the infrastructure. They simply target a single, stable DNS name.
  • Centralized Security & SSL Termination: Firewalls, WAFs, and SSL certificates can be managed at a single, centralized proxy layer.
  • Language Agnostic: Works across any programming language or framework since routing is handled at the network level.

Disadvantages:

  • Single Point of Failure (SPOF): If the centralized load balancer fails or becomes misconfigured, the entire system goes down.
  • Network Latency: Every request must hop through an intermediate proxy, introducing additional network latency (extra TCP handshake, packet parsing, and serialization overhead).
  • Scaling Bottleneck: Centralized load balancers must be scaled vertically or horizontally to handle massive spikes in traffic, increasing operational complexity and infrastructure costs.

2. Client-Side Load Balancing

In client-side load balancing, the centralized proxy is eliminated. The client microservice itself is responsible for routing decisions. It queries a Service Registry (which acts as a dynamic phone book) to discover the network locations (IPs and ports) of all available instances of the target service. The client then caches this list and uses an internal library to distribute calls across those instances.

+------------------+               1. Fetch Instances               +------------------+
|                  | <============================================> | Service Registry |
|                  |                                                | (Eureka/Consul)  |
|  Client Service  |                                                +------------------+
|  (with SCLB)     |               2. Direct HTTP Call
|                  | ---------------------------------------------+
+------------------+                                              |
     |                                                            v
     | Local Routing Algorithm                             +------------------+
     +--- Round Robin -----------------------------------> | Service Instance |
     |                                                     |      (Node A)    |
     +--- Same-Zone Priority -----------------------------+ +------------------+
                                                           |
                                                           v
                                                   +------------------+
                                                   | Service Instance |
                                                   |      (Node B)    |
                                                   +------------------+
    

Advantages:

  • No Intermediate Network Hops: Requests go directly from the client to the target instance, maximizing throughput and minimizing latency.
  • No Centralized SPOF: Even if the Service Registry goes down, clients can continue functioning using their locally cached instance lists.
  • Dynamic & Context-Aware Routing: Clients can apply highly customizable routing rules. For example, a client can choose to only call instances running in the same cloud availability zone to avoid cross-zone data transfer costs, or route requests to specific instances based on the current user's location or request headers (Canary releases).
  • High Scalability: The load balancing overhead is completely distributed across all client nodes, scaling naturally as the application grows.

Disadvantages:

  • Heavy Clients: Clients must run specialized libraries (like Spring Cloud LoadBalancer), increasing memory footprint and configuration complexity.
  • Language/Framework Lock-in: Implementing robust, identical client-side load balancing logic across polyglot microservices (e.g., Java, Go, Node.js) is highly challenging without a service mesh (like Istio or Linkerd).
  • Registry Dependency: Relies heavily on a highly available Service Registry to maintain accurate, up-to-date health states of all instances.

Comparison Matrix

Feature Server-Side Load Balancing Client-Side Load Balancing
Routing Decision Point Central Proxy (NGINX, F5, AWS ALB) Inside the Client JVM/Process
Network Hops 2 hops (Client -> Proxy -> Server) 1 hop (Client -> Server)
Single Point of Failure Yes (The central proxy) No (Distributed; registry is a control-plane only)
Configuration Complexity Low (Infrastructure level) High (Application level)
Resource Consumption High on proxy hardware Distributed across client microservices
Contextual Routing Limited (based on HTTP headers, paths) Excellent (can access full JVM state, local configs)

Why Spring Cloud LoadBalancer Over Netflix Ribbon?

For several years, Netflix Ribbon was the de facto client-side load balancer in the Spring Cloud ecosystem. However, Netflix placed Ribbon into maintenance mode in 2018. To address this, the Spring Cloud team engineered a modern, cloud-native replacement: Spring Cloud LoadBalancer (SCLB).

There are several critical architectural reasons why modern enterprise applications must migrate away from Ribbon to SCLB:

  • Blocking vs. Reactive Stack: Ribbon was designed around a blocking, thread-per-request model. It relies heavily on ThreadLocal variables to pass context down the execution stack. This design is completely incompatible with modern, non-blocking reactive stacks like Spring WebFlux and Project Reactor. Spring Cloud LoadBalancer is built from the ground up using Project Reactor, making it fully non-blocking, highly performant, and compatible with both reactive (WebFlux) and servlet-based (MVC) applications.
  • Dependency Bloat & Archaius Coupling: Ribbon is tightly coupled to various legacy Netflix libraries, most notably Archaius for configuration management. Archaius introduces substantial dependency overhead, relies heavily on static configuration singletons, and is difficult to integrate with modern externalized configuration providers (like Spring Cloud Config, Kubernetes ConfigMaps, or HashiCorp Vault). SCLB integrates natively with standard Spring Boot properties, configurations, and profiles.
  • Active Maintenance and Ecosystem Alignment: Ribbon is no longer actively developed. Spring Cloud LoadBalancer is actively maintained, receives security patches, and is deeply integrated with the modern Spring Cloud stack (including Spring Cloud Gateway, OpenFeign, and Spring Cloud Consul/Eureka/ZooKeeper).

Spring Cloud LoadBalancer Internal Architecture

To write highly customized load-balancing rules and troubleshoot runtime issues, we must understand the core components and interfaces that drive Spring Cloud LoadBalancer.

+---------------------------------------------------------------------------------+
|                           ReactiveLoadBalancer Client                           |
|                                                                                 |
|   +-------------------------------------------------------------------------+   |
|   |                      ServiceInstanceListSupplier                        |   |
|   |  - Fetches healthy instances from Service Discovery                     |   |
|   |  - Applies local caching (delegated by CacheManager)                    |   |
|   +-------------------------------------------------------------------------+   |
|                                        |                                        |
|                                        v                                        |
|   +-------------------------------------------------------------------------+   |
|   |                    ReactorServiceInstanceLoadBalancer                   |   |
|   |  - Executes routing algorithm (RoundRobin, Random, or Custom)           |   |
|   |  - Returns chosen ServiceInstance wrapped in a Mono                     |   |
|   +-------------------------------------------------------------------------+   |
|                                        |                                        |
+----------------------------------------|----------------------------------------+
                                         v
                      +--------------------------------------+
                      | Target Service Instance (HTTP Call)  |
                      +--------------------------------------+
    

1. Core Interfaces and Components

  • ReactiveLoadBalancer<ServiceInstance>: The root interface representing a reactive load balancer. It defines a single method:
    Mono<Response<ServiceInstance>> choose(Request request);
    This non-blocking method returns a reactive Mono containing the chosen ServiceInstance.
  • ReactorServiceInstanceLoadBalancer: A sub-interface of ReactiveLoadBalancer specifically tailored for Project Reactor. Spring Cloud provides two default implementations out of the box:
    • RoundRobinLoadBalancer: Selects instances sequentially using a thread-safe atomic counter. This is the default algorithm.
    • RandomLoadBalancer: Selects instances randomly, which is useful in large-scale environments where request distribution naturally evens out.
  • ServiceInstanceListSupplier: A critical supplier interface responsible for providing a list of available ServiceInstance objects. This interface acts as the bridge between the load balancer and the discovery client. Various decorator patterns are applied to this supplier to implement advanced features:
    • DiscoveryClientServiceInstanceListSupplier: Queries the active DiscoveryClient (e.g., Eureka) for instances.
    • CachingServiceInstanceListSupplier: Wraps another supplier and caches the retrieved list in memory to avoid hitting the discovery client on every single HTTP request.
    • ZonePreferenceServiceInstanceListSupplier: Filters instances to prioritize those running in the same zone as the client.
    • HealthCheckServiceInstanceListSupplier: Periodically executes active health checks against instances to verify their viability.
  • LoadBalancerClient: A blocking, legacy-compatible interface used by non-reactive clients (like RestTemplate or OpenFeign). Under the hood, it blocks the reactive ReactiveLoadBalancer to resolve the service instance synchronously.

2. The Lifecycle of a Load-Balanced Request

  1. An HTTP client (e.g., WebClient or RestTemplate) intercepts an outgoing request targeting a virtual URI (e.g., http://order-service/orders/123).
  2. The client extracts the service ID (order-service) from the host portion of the URI.
  3. The client delegates to the LoadBalancerClientFactory to retrieve the specific ReactiveLoadBalancer instance dedicated to order-service.
  4. The load balancer requests the list of instances from its configured ServiceInstanceListSupplier.
  5. The ServiceInstanceListSupplier checks its local cache. If the cache has expired, it queries the underlying Service Registry (e.g., Eureka), updates the cache, and returns the list.
  6. The load balancer executes its selection algorithm (e.g., Round Robin) on the list of healthy instances.
  7. The chosen ServiceInstance (containing the actual IP and port) is returned.
  8. The HTTP client reconstructs the URI (e.g., http://10.0.1.45:8082/orders/123) and executes the actual network call.

Step-by-Step Implementation Guide

Let's build a complete, production-ready microservices setup demonstrating client-side load balancing using Spring Boot 3.x, Spring Cloud, Eureka Service Discovery, and both blocking (RestTemplate) and non-blocking (WebClient) architectures.

Step 1: Parent Pom and Dependency Management

First, ensure your Maven pom.xml is configured with the correct versions for Spring Boot and Spring Cloud. We will use Spring Boot 3.2.x and Spring Cloud 2023.0.x (the "Leyton" release train).

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.enterprise.microservices</groupId>
    <artifactId>loadbalancer-demo</artifactId>
    <version>1.0.0</version>
    <packaging>pom</packaging>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.2.3</version>
        <relativePath/>
    </parent>

    <properties>
        <java.version>17</java.version>
        <spring-cloud.version>2023.0.0</spring-cloud.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-dependencies</artifactId>
                <version>\${spring-cloud.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>
</project>

Step 2: Client Service Dependencies

In your client microservice (e.g., customer-service), include the Eureka Client, Spring Cloud LoadBalancer, and Spring Web/WebFlux dependencies.

<dependencies>
    <!-- Spring Boot Web for REST Controllers -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- Spring Boot WebFlux for Reactive WebClient -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webflux</artifactId>
    </dependency>

    <!-- Spring Cloud Service Discovery Eureka Client -->
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
    </dependency>

    <!-- Spring Cloud LoadBalancer -->
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-loadbalancer</artifactId>
    </dependency>

    <!-- Spring Boot Actuator for Metrics and Health -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
</dependencies>

Step 3: Application Configuration

Configure your application.yml file to connect to Eureka and tune the caching behavior of Spring Cloud LoadBalancer. Caching is critical in production to ensure that SCLB does not saturate the Eureka server with continuous discovery queries.

server:
  port: 8080

spring:
  application:
    name: customer-service

eureka:
  client:
    service-url:
      defaultZone: http://localhost:8761/eureka/
    registry-fetch-interval-seconds: 30
  instance:
    prefer-ip-address: true

# Spring Cloud LoadBalancer Fine-Tuning
spring:
  cloud:
    loadbalancer:
      # Enable caching for high throughput
      cache:
        enabled: true
        # Time-to-live for cached service instances (default is 35s)
        ttl: 15s
        # Maximum number of service cache entries
        capacity: 256
      # Enable active health checks for instances if desired
      health-checker:
        interval: 10s
        path: /actuator/health

Step 4: Configuring Load-Balanced RestTemplate and WebClient

To register load-balanced HTTP clients, create a configuration class. The key annotation here is @LoadBalanced. When applied to a RestTemplate bean or a WebClient.Builder bean, Spring injects custom interceptors and filter functions that handle the load-balancing lifecycle.

package com.enterprise.microservices.loadbalancer.config;

import org.springframework.cloud.client.loadbalancer.LoadBalanced;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.client.RestTemplate;
import org.springframework.web.reactive.function.client.WebClient;

@Configuration
public class HttpClientConfig {

    /**
     * Configures a load-balanced RestTemplate.
     * The @LoadBalanced annotation intercepts requests and applies LoadBalancerInterceptor.
     */
    @Bean
    @LoadBalanced
    public RestTemplate restTemplate() {
        return new RestTemplate();
    }

    /**
     * Configures a load-balanced WebClient Builder.
     * The @LoadBalanced annotation binds DeferringLoadBalancerExchangeFilterFunction.
     */
    @Bean
    @LoadBalanced
    public WebClient.Builder webClientBuilder() {
        return WebClient.builder();
    }

    /**
     * Instantiates the WebClient instance using the load-balanced builder.
     */
    @Bean
    public WebClient webClient(WebClient.Builder builder) {
        return builder.build();
    }
}

Step 5: Consuming Services Using Virtual URIs

Now, let's write a service layer that communicates with an external microservice named order-service. Instead of hardcoding physical IPs, we will use the logical application name of the target service in our URIs.

package com.enterprise.microservices.loadbalancer.service;

import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Mono;

@Service
public class OrderServiceClient {

    private final RestTemplate restTemplate;
    private final WebClient webClient;

    // The virtual base URI targeting the logical service name registered in Eureka
    private static final String ORDER_SERVICE_URI = "http://order-service";

    public OrderServiceClient(RestTemplate restTemplate, WebClient webClient) {
        this.restTemplate = restTemplate;
        this.webClient = webClient;
    }

    /**
     * Example using the blocking RestTemplate.
     */
    public String getOrdersBlocking(String customerId) {
        String url = ORDER_SERVICE_URI + "/orders?customerId=" + customerId;
        // RestTemplate will intercept this, resolve "order-service" via SCLB, and call the chosen node
        return restTemplate.getForObject(url, String.class);
    }

    /**
     * Example using the non-blocking reactive WebClient.
     */
    public Mono<String> getOrdersReactive(String customerId) {
        return webClient.get()
                .uri(ORDER_SERVICE_URI + "/orders?customerId=" + customerId)
                .retrieve()
                .bodyToMono(String.class);
    }
}

Customizing the Load Balancing Algorithm

While the default Round Robin algorithm is sufficient for simple deployments, production enterprise environments often require custom routing behaviors. For example, you might want a Random Load Balancer or a specialized algorithm based on latency or instance weights.

To customize the load-balancing behavior for a specific service, we must define a custom configuration class. Crucial Rule: This custom configuration class must not be annotated with @Configuration in a way that allows it to be scanned by the main @ComponentScan. If it is globally scanned, it will override the default load balancer configurations for all services, which is rarely the desired outcome.

1. Creating the Custom Load Balancer Configuration

Below is a configuration class that overrides the default Round Robin strategy and replaces it with a RandomLoadBalancer.

package com.enterprise.microservices.loadbalancer.config;

import org.springframework.cloud.client.ServiceInstance;
import org.springframework.cloud.loadbalancer.core.RandomLoadBalancer;
import org.springframework.cloud.loadbalancer.core.ReactorLoadBalancer;
import org.springframework.cloud.loadbalancer.core.ServiceInstanceListSupplier;
import org.springframework.cloud.loadbalancer.support.LoadBalancerClientFactory;
import org.springframework.context.annotation.Bean;
import org.springframework.core.env.Environment;

/**
 * Custom configuration for a specific service's load balancer.
 * Notice that we do NOT mark this with @Configuration to prevent global component scanning.
 */
public class CustomLoadBalancerConfig {

    @Bean
    public ReactorServiceInstanceLoadBalancer customLoadBalancer(
            Environment environment,
            LoadBalancerClientFactory loadBalancerClientFactory) {
        
        // Retrieve the name of the service for which this load balancer is being created
        String name = environment.getProperty(LoadBalancerClientFactory.PROPERTY_NAME);
        
        // Return a Random Load Balancer instead of Round Robin
        return new RandomLoadBalancer(
                loadBalancerClientFactory.getLazyProvider(name, ServiceInstanceListSupplier.class),
                name
        );
    }
}

2. Declaring the Custom Client

Now, in your main application class or any scanned configuration class, bind this custom configuration to a specific target microservice using the @LoadBalancerClient annotation.

package com.enterprise.microservices.loadbalancer;

import com.enterprise.microservices.loadbalancer.config.CustomLoadBalancerConfig;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.client.discovery.EnableDiscoveryClient;
import org.springframework.cloud.loadbalancer.annotation.LoadBalancerClient;

@SpringBootApplication
@EnableDiscoveryClient
// Apply the custom configuration ONLY to calls targeting "order-service"
@LoadBalancerClient(name = "order-service", configuration = CustomLoadBalancerConfig.class)
public class CustomerServiceApplication {
    public static void main(String[] args) {
        SpringApplication.run(CustomerServiceApplication.class, args);
    }
}

If you have multiple services requiring different configurations, you can use the nested @LoadBalancerClients annotation:

@LoadBalancerClients({
    @LoadBalancerClient(name = "order-service", configuration = CustomLoadBalancerConfig.class),
    @LoadBalancerClient(name = "inventory-service", configuration = AnotherCustomConfig.class)
})

Advanced Enterprise Patterns

In massive cloud deployments (e.g., AWS, GCP, or multi-region Kubernetes), basic load balancing is insufficient. Let's look at how to implement highly resilient, cost-effective routing strategies.

Pattern 1: Zone-Preference Routing (AWS Availability Zone Alignment)

In cloud-native architectures, sending data across Availability Zones (AZs) incurs significant data egress charges and introduces millisecond-level latency. To optimize performance and reduce cloud spend, you should configure your microservices to route traffic to upstream instances running in their same AZ first. If no instances exist in the same AZ, SCLB should fail over to other zones.

+-------------------------------------------------------------------------+
| AWS Availability Zone: us-east-1a                                       |
|                                                                         |
|  +--------------------+  (Local Low-Latency Call)  +-----------------+  |
|  | Customer-Service   | -------------------------> | Order-Service   |  |
|  | (Client Node)      |                            | (Instance A)    |  |
|  +--------------------+                            +-----------------+  |
+-------------------------------------------------------------------------+
     |
     | (Failover Cross-Zone Call - only if Local Instance A is down)
     v
+-------------------------------------------------------------------------+
| AWS Availability Zone: us-east-1b                                       |
|                                                                         |
|                                                    +-----------------+  |
|                                                    | Order-Service   |  |
|                                                    | (Instance B)    |  |
|                                                    +-----------------+  |
+-------------------------------------------------------------------------+
    

To enable zone-preference routing, we can configure our custom load balancer to use Spring's ZonePreferenceServiceInstanceListSupplier. First, ensure your service instances register their availability zone in their metadata (e.g., in Eureka metadata).

Here is how to configure Eureka Client to register zone metadata in application.yml:

eureka:
  instance:
    metadata-map:
      zone: us-east-1a # Extracted dynamically from cloud metadata in production

Next, instantiate the zone-aware list supplier in your custom load balancer configuration:

package com.enterprise.microservices.loadbalancer.config;

import org.springframework.cloud.client.ServiceInstance;
import org.springframework.cloud.loadbalancer.core.DelegatingServiceInstanceListSupplier;
import org.springframework.cloud.loadbalancer.core.DiscoveryClientServiceInstanceListSupplier;
import org.springframework.cloud.loadbalancer.core.ServiceInstanceListSupplier;
import org.springframework.cloud.loadbalancer.core.ZonePreferenceServiceInstanceListSupplier;
import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.context.annotation.Bean;

public class ZoneAwareLoadBalancerConfig {

    @Bean
    public ServiceInstanceListSupplier discoveryClientServiceInstanceListSupplier(
            ConfigurableApplicationContext context) {
        
        // Build the base discovery client supplier
        DiscoveryClientServiceInstanceListSupplier delegate = 
                new DiscoveryClientServiceInstanceListSupplier(context);
        
        // Wrap it with the ZonePreference decorator
        return new ZonePreferenceServiceInstanceListSupplier(delegate, context);
    }
}

Pattern 2: Dynamic Canary Routing (Header-Based Instancing)

For zero-downtime canary deployments, you might want to route a small percentage of traffic (or specific beta users) to a new version of your service (e.g., v2.0.0) while the rest of the traffic continues targeting v1.0.0. We can achieve this by implementing a custom ReactorServiceInstanceLoadBalancer that reads the incoming request headers.

package com.enterprise.microservices.loadbalancer.routing;

import org.springframework.beans.factory.ObjectProvider;
import org.springframework.cloud.client.ServiceInstance;
import org.springframework.cloud.client.metadata.ServiceInstanceMetadataAccessor;
import org.springframework.cloud.loadbalancer.core.ReactorServiceInstanceLoadBalancer;
import org.springframework.cloud.loadbalancer.core.ServiceInstanceListSupplier;
import org.springframework.cloud.loadbalancer.model.DefaultResponse;
import org.springframework.cloud.loadbalancer.model.Request;
import org.springframework.cloud.loadbalancer.model.RequestDataContext;
import org.springframework.cloud.loadbalancer.model.Response;
import org.springframework.http.HttpHeaders;
import reactor.core.publisher.Mono;

import java.util.List;
import java.util.concurrent.ThreadLocalRandom;
import java.util.stream.Collectors;

public class CanaryLoadBalancer implements ReactorServiceInstanceLoadBalancer {

    private final ObjectProvider<ServiceInstanceListSupplier> serviceInstanceListSupplierProvider;
    private final String serviceId;

    public CanaryLoadBalancer(ObjectProvider<ServiceInstanceListSupplier> supplierProvider, String serviceId) {
        this.serviceInstanceListSupplierProvider = supplierProvider;
        this.serviceId = serviceId;
    }

    @Override
    public Mono<Response<ServiceInstance>> choose(Request request) {
        ServiceInstanceListSupplier supplier = serviceInstanceListSupplierProvider.getIfAvailable();
        if (supplier == null) {
            return Mono.just(new DefaultResponse(null));
        }

        return supplier.get(request).next().map(instances -> processSelection(instances, request));
    }

    private Response<ServiceInstance> processSelection(List<ServiceInstance> instances, Request request) {
        if (instances.isEmpty()) {
            return new DefaultResponse(null);
        }

        // Extract HTTP headers from the request context
        HttpHeaders headers = null;
        if (request.getContext() instanceof RequestDataContext context) {
            headers = context.getClientRequest().getHeaders();
        }

        boolean routeToBeta = false;
        if (headers != null && headers.containsKey("X-Beta-User")) {
            routeToBeta = Boolean.parseBoolean(headers.getFirst("X-Beta-User"));
        }

        // Filter instances based on metadata tags
        final boolean finalRouteToBeta = routeToBeta;
        List<ServiceInstance> filteredInstances = instances.stream()
                .filter(inst -> {
                    String version = inst.getMetadata().getOrDefault("version", "v1");
                    return finalRouteToBeta ? "v2".equals(version) : "v1".equals(version);
                })
                .collect(Collectors.toList());

        // Fallback to all instances if no match is found
        if (filteredInstances.isEmpty()) {
            filteredInstances = instances;
        }

        // Apply a random selection among the qualified instances
        int index = ThreadLocalRandom.current().nextInt(filteredInstances.size());
        return new DefaultResponse(filteredInstances.get(index));
    }
}

Pattern 3: Integrating Client-Side Retries with Resilience4j

In ephemeral cloud environments, transient network failures are common. If a load-balanced request fails due to a socket timeout or connection refusal, SCLB can be configured to retry the request on a different service instance, ensuring self-healing execution flow.

To implement this, integrate Resilience4j into your load-balanced client. Add the dependency:

<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId>
    <version>2.1.0</version>
</dependency>

Next, configure a retry policy in your application.yml:

resilience4j:
  retry:
    instances:
      orderServiceRetry:
        max-attempts: 3
        wait-duration: 500ms
        # Retry on standard network I/O exceptions
        retry-exceptions:
          - java.io.IOException
          - org.springframework.web.client.ResourceAccessException
          - org.springframework.web.reactive.function.client.WebClientRequestException

Apply the retry decorator to your client calls. If an instance fails, SCLB will transparently choose a different instance for the subsequent retries:

package com.enterprise.microservices.loadbalancer.service;

import io.github.resilience4j.retry.annotation.Retry;
import org.springframework.stereotype.Service;
import org.springframework.web.client.RestTemplate;

@Service
public class ResilientOrderClient {

    private final RestTemplate restTemplate;

    public ResilientOrderClient(RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
    }

    @Retry(name = "orderServiceRetry", fallbackMethod = "fallbackGetOrders")
    public String getOrdersWithRetry(String customerId) {
        // If this call fails, SCLB intercepts the next attempt and routes to a different instance
        return restTemplate.getForObject("http://order-service/orders?customerId=" + customerId, String.class);
    }

    public String fallbackGetOrders(String customerId, Throwable ex) {
        return "Fallback: Order Service is currently unavailable. Error: " + ex.getMessage();
    }
}

Operational Concerns, Metrics, and Observability

In production-scale systems, implementing client-side load balancing is only the beginning. Enterprise teams must also ensure visibility into routing decisions, retry behavior, request latency, and service health.

1. Enabling Actuator Metrics

Spring Boot Actuator combined with Micrometer provides production-grade observability for Spring Cloud LoadBalancer.

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Expose actuator endpoints:

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus

  endpoint:
    health:
      show-details: always

  metrics:
    tags:
      application: customer-service

2. Important Metrics to Monitor

Metric Description
http.client.requests Total outbound HTTP requests
resilience4j.retry.calls Retry attempt statistics
reactor.netty.connection.provider.active.connections Active HTTP client connections
loadbalancer.requests.success Successful load-balanced requests
loadbalancer.requests.failed Failed routing attempts

3. Distributed Tracing

In distributed microservices systems, debugging routing problems without tracing is nearly impossible. Integrate OpenTelemetry or Micrometer Tracing with Zipkin or Jaeger.

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>

<dependency>
    <groupId>io.zipkin.reporter2</groupId>
    <artifactId>zipkin-reporter-brave</artifactId>
</dependency>

This enables engineers to trace requests flowing across multiple load-balanced services, retries, circuit breakers, and fallback paths.

4. Logging Load-Balancing Decisions

During troubleshooting, enable DEBUG logs for load-balancer internals.

logging:
  level:
    org.springframework.cloud.loadbalancer: DEBUG
    reactor.netty.http.client: DEBUG

These logs reveal:

  • Selected service instance
  • Retry attempts
  • Discovery refresh operations
  • Health-check failures
  • Cache invalidation events

Troubleshooting and Common Pitfalls

Problem 1: No Servers Available for Service

java.lang.IllegalStateException:
No instances available for order-service

Possible Causes:

  • Target service is not registered in Eureka
  • Service name mismatch
  • Eureka server unavailable
  • Instance marked DOWN due to failed health checks

Solution:

  • Verify Eureka dashboard registrations
  • Ensure spring.application.name matches exactly
  • Validate actuator health endpoint availability

Problem 2: Requests Always Hit Same Instance

Possible Causes:

  • Only one instance running
  • Caching stale instance list
  • Sticky sessions enabled externally

Solution:

spring:
  cloud:
    loadbalancer:
      cache:
        ttl: 5s

Problem 3: Reactive Context Lost During Retries

Improper retry configuration may break Reactor context propagation.

Best Practice:

  • Use Reactor-native retry operators
  • Avoid blocking operations inside reactive flows
  • Never call .block() in reactive pipelines

Problem 4: Cross-Zone Traffic Costs Increasing

Cause:

  • Zone metadata missing from service registration

Solution:

  • Enable zone-aware routing
  • Verify Eureka metadata propagation
  • Use same-zone affinity strategies

Technical Interview Questions & Answers

1. What is client-side load balancing?

Client-side load balancing is a distributed routing pattern where the client application itself selects the target service instance using a local routing algorithm and service discovery information.

2. What is the difference between Ribbon and Spring Cloud LoadBalancer?

Ribbon Spring Cloud LoadBalancer
Netflix-maintained Spring-maintained
Blocking architecture Reactive-first architecture
Maintenance mode Actively developed
Archaius dependency Native Spring Boot integration

3. How does Spring Cloud LoadBalancer work internally?

It retrieves healthy service instances from the discovery client using ServiceInstanceListSupplier, applies a routing algorithm through ReactorServiceInstanceLoadBalancer, and forwards the request to the selected instance.

4. What is the default algorithm in SCLB?

Round Robin is the default load-balancing algorithm.

5. Can Spring Cloud LoadBalancer work without Eureka?

Yes. It supports multiple discovery providers such as:

  • Consul
  • Kubernetes
  • Zookeeper
  • Static service lists

6. Why is WebClient preferred over RestTemplate?

WebClient is non-blocking, reactive, scalable, and optimized for high-concurrency systems, whereas RestTemplate is blocking and legacy-oriented.

Frequently Asked Questions (FAQs)

Is Spring Cloud LoadBalancer production ready?

Yes. It is the officially recommended replacement for Netflix Ribbon and is widely used in enterprise cloud-native deployments.

Does SCLB support weighted routing?

Yes. You can implement custom ReactorServiceInstanceLoadBalancer strategies using instance metadata and custom selection logic.

Can I combine SCLB with Kubernetes?

Absolutely. SCLB integrates with Kubernetes service discovery, although many Kubernetes environments also rely on service mesh technologies like Istio or Linkerd.

Should I still use API Gateways with client-side load balancing?

Yes. API Gateways solve edge-routing concerns, authentication, rate limiting, and request aggregation, while SCLB handles internal service-to-service routing.

Can retries cause duplicate requests?

Yes. Retrying non-idempotent operations may produce duplicate side effects. Always design retry-safe APIs using idempotency keys or transactional safeguards.

Summary and Next Steps

Spring Cloud LoadBalancer provides a modern, reactive, and enterprise-grade client-side load-balancing framework for distributed microservices architectures. By decentralizing routing logic into the client layer, organizations achieve:

  • Lower network latency
  • Improved scalability
  • Zone-aware intelligent routing
  • Advanced resilience patterns
  • Reduced infrastructure bottlenecks

Throughout this guide, we explored:

  • Architectural foundations of client-side load balancing
  • Migration from Ribbon to Spring Cloud LoadBalancer
  • Internal SCLB architecture and lifecycle
  • Reactive and blocking HTTP client integrations
  • Custom routing strategies
  • Canary deployments
  • Zone-aware traffic optimization
  • Resilience4j integration
  • Production observability and troubleshooting

In modern cloud-native systems, mastering intelligent traffic distribution is essential for building highly available, fault-tolerant, and cost-efficient platforms.

Recommended Next Topics

  • Spring Cloud Gateway
  • Resilience4j Circuit Breakers
  • Distributed Tracing with OpenTelemetry
  • Service Mesh with Istio
  • Kubernetes Service Discovery
  • Reactive Microservices using Spring WebFlux

About the Author

Naresh Kumar

Naresh Kumar

Senior Java Backend Engineer experienced in Banking, Payments, ISO 20022, Spring Boot, Microservices, Kafka, Docker, Kubernetes, AWS and Cloud Native Systems.

Built enterprise payment solutions, transaction processing systems, API platforms and scalable microservices used in production.

LinkedIn Profile