Kafka Consumer Groups and Rebalancing: The Complete Guide
Last Updated: May 28, 2026
Learn how Kafka Consumer Groups work internally, how partitions are distributed among consumers, how Kafka handles rebalancing, and how to build scalable event-driven systems with production-ready consumer configurations.
Before reading this guide, we strongly recommend understanding the basics of Kafka Consumers and Offset Management:
In large-scale distributed systems, processing millions of events using a single consumer instance is never practical. Apache Kafka solves this problem using Consumer Groups, allowing multiple consumer instances to share the processing load in parallel.
Consumer Groups are one of Kafka’s most important architectural features and are heavily used in:
- Microservices architectures
- Real-time analytics systems
- Fraud detection engines
- Event-driven banking systems
- IoT telemetry pipelines
- Streaming ETL platforms
- Cloud-native event processing
Without Consumer Groups, Kafka would not be able to scale horizontally across multiple servers and consumer instances.
This guide covers:
- Consumer group internals
- Partition assignment strategies
- Heartbeat mechanisms
- Rebalancing workflows
- Eager vs Cooperative rebalancing
- Offset handling during failures
- Monitoring consumer lag
- Production tuning
- Kubernetes deployment strategies
- Enterprise best practices
- Java implementation examples
- Kafka CLI management commands
- Interview questions and FAQs
Table of Contents
- What is a Kafka Consumer Group?
- How Partitions are Distributed
- Consumer Group Internal Architecture
- Group Coordinator and Heartbeats
- Understanding Rebalancing
- Internal Rebalance Workflow
- Why Rebalances Are Expensive
- Eager vs Cooperative Rebalancing
- Partition Assignment Strategies
- Static Membership
- Java Consumer Group Example
- Offset Management During Rebalancing
- Understanding Consumer Lag
- Monitoring Consumer Groups
- Kafka Consumer Groups in Kubernetes
- Real-World Use Cases
- Production Best Practices
- Common Mistakes
- FAQs
- Interview Questions
- Summary
What is a Kafka Consumer Group?
A Kafka Consumer Group is a collection of consumer instances working together to consume records from one or more Kafka topics.
Every consumer group is identified using a unique:
group.id
Kafka uses consumer groups to distribute partition ownership across multiple consumers and enable horizontal scalability.
The core rule of Kafka consumer groups is:
Each partition can be assigned to only ONE consumer within the same consumer group.
This rule guarantees:
- Strict ordering within partitions
- No duplicate concurrent processing
- Predictable offset management
- Reliable parallelism
A single consumer, however, may own multiple partitions simultaneously.
How Partitions are Distributed
Scenario A: Consumers Less Than Partitions
Partitions: [P0] [P1] [P2] [P3] Consumers: C1 -> P0, P1 C2 -> P2, P3
This is the most common production architecture because it balances parallelism and infrastructure cost efficiently.
Scenario B: Consumers Equal to Partitions
Partitions: [P0] [P1] [P2] [P3] Consumers: C1 -> P0 C2 -> P1 C3 -> P2 C4 -> P3
This provides maximum parallel processing capacity.
Scenario C: Consumers Greater Than Partitions
Partitions: [P0] [P1] [P2] Consumers: C1 -> P0 C2 -> P1 C3 -> P2 C4 -> IDLE C5 -> IDLE
Extra consumers remain idle because Kafka never splits a single partition across multiple consumers in the same group.
Consumer Group Internal Architecture
Internally, Kafka uses multiple coordination components to manage consumer groups reliably.
Internal Workflow Diagram
+----------------------------------------------------------------------------------+
| KAFKA CLUSTER |
| |
| Topic: customer-orders |
| |
| +----------------+ +----------------+ +----------------+ |
| | Partition P0 | | Partition P1 | | Partition P2 | |
| +----------------+ +----------------+ +----------------+ |
| |
| ^ ^ ^ |
| | | | |
| | Fetch | Fetch | Fetch |
+---------------|------------------|--------------------|--------------------------+
| | |
v v v
+----------------------------------------------------------------------------------+
| CONSUMER GROUP: order-service-group |
| |
| +-------------------+ +-------------------+ |
| | Consumer Instance | | Consumer Instance | |
| | C1 | | C2 | |
| +-------------------+ +-------------------+ |
| | | | |
| | +---- P1 +---- P2 |
| +------------ P0 |
| |
| - Sends Heartbeats to Group Coordinator |
| - Commits Offsets to __consumer_offsets |
| - Polls Messages Continuously |
+----------------------------------------------------------------------------------+
|
v
+--------------------------------------------------+
| GROUP COORDINATOR |
|--------------------------------------------------|
| - Tracks active consumers |
| - Detects failures |
| - Triggers rebalancing |
| - Assigns partitions |
+--------------------------------------------------+
The Group Coordinator is one of Kafka’s brokers responsible for coordinating membership and partition ownership.
Group Coordinator and Heartbeats
Each consumer continuously sends heartbeat requests to the Group Coordinator.
These heartbeats tell Kafka:
- The consumer is alive
- The consumer is still processing
- The consumer still owns its assigned partitions
If heartbeats stop arriving within:
session.timeout.ms
Kafka assumes the consumer has failed and initiates a rebalance.
Important Configurations
heartbeat.interval.mssession.timeout.msmax.poll.interval.ms
Recommended Production Settings
heartbeat.interval.ms = 3000 session.timeout.ms = 45000
A common best practice is:
heartbeat.interval.ms ≈ 1/3 of session.timeout.ms
Understanding Rebalancing
Rebalancing is the process where Kafka redistributes partition ownership among consumers in a group.
Rebalances are critical for:
- Fault tolerance
- Load balancing
- Horizontal scalability
- Automatic recovery
What Triggers Rebalancing?
- New consumer joins group
- Consumer crashes
- Consumer shuts down
- Heartbeat timeout occurs
- Partitions are added
- Subscription patterns change
- max.poll.interval.ms exceeded
Consumer Group Rebalancing Internal Workflow
+----------------------------------------------------------------------------------+
| KAFKA CONSUMER GROUP REBALANCING |
+----------------------------------------------------------------------------------+
STEP 1: Consumer C3 Joins Group
--------------------------------------------------
Existing Group:
C1 -> P0, P1
C2 -> P2, P3
+
|
v
New Consumer C3
STEP 2: Coordinator Detects Membership Change
--------------------------------------------------
+--------------------------------------------------+
| GROUP COORDINATOR |
|--------------------------------------------------|
| - Detects new consumer |
| - Stops current assignments |
| - Initiates rebalance |
+--------------------------------------------------+
STEP 3: Partitions Revoked
--------------------------------------------------
C1 releases partitions
C2 releases partitions
STEP 4: New Assignments Calculated
--------------------------------------------------
C1 -> P0
C2 -> P1
C3 -> P2, P3
STEP 5: Consumers Resume Processing
--------------------------------------------------
All consumers continue polling records
with updated ownership assignments
+----------------------------------------------------------------------------------+
Why Kafka Rebalances Can Become Expensive
In small Kafka clusters, rebalancing is usually quick and harmless.
However, in enterprise-scale systems with:
- Hundreds of brokers
- Thousands of partitions
- Large consumer groups
- Heavy stateful processing
Rebalances can become extremely expensive operations.
During Rebalancing
- Consumers stop processing temporarily
- Partitions are revoked
- Offset commits pause
- Network coordination increases
- Partition ownership changes
- Local caches may invalidate
- Kafka Streams state stores may rebuild
Frequent rebalances can cause:
- Latency spikes
- Consumer lag growth
- Duplicate processing
- Reduced throughput
- System instability
Eager vs Cooperative Rebalancing
1. Eager Rebalancing
Traditional Kafka clusters used eager rebalancing.
In eager rebalancing:
- All consumers stop processing
- All partitions are revoked
- Assignments recalculate from scratch
- Consumers resume afterward
This creates a "stop-the-world" effect.
Problems with Eager Rebalancing
- High processing pauses
- Heavy network coordination
- Large-scale partition movement
- Frequent duplicate processing
2. Cooperative Sticky Rebalancing
Kafka 2.4 introduced Cooperative Sticky Rebalancing.
This strategy:
- Moves only required partitions
- Allows unaffected consumers to continue processing
- Minimizes partition movement
- Reduces downtime significantly
Modern Kafka systems should strongly prefer Cooperative Sticky Rebalancing.
Partition Assignment Strategies
Range Assignor
Partitions are assigned in ranges per topic.
Simple but can create uneven distribution.
RoundRobin Assignor
Partitions are distributed evenly across consumers.
Provides better balance but increases partition movement.
Sticky Assignor
Attempts to preserve existing assignments during rebalances.
Reduces disruption significantly.
CooperativeStickyAssignor
Combines sticky assignments with cooperative rebalancing.
This is generally the best production strategy.
Static Membership in Kafka Consumer Groups
Kafka introduced Static Membership to reduce unnecessary rebalances.
Normally, restarting a consumer causes Kafka to treat it as a completely new member.
With static membership:
group.instance.id
Kafka recognizes restarting consumers as the same logical member.
Benefits
- Reduces rebalance storms
- Improves cache locality
- Minimizes partition movement
- Improves Kubernetes stability
- Reduces downtime
Java Consumer Group Example
package com.example.kafka;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
public class OrderConsumerGroup {
public static void main(String[] args) {
Properties properties = new Properties();
properties.put(
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092"
);
properties.put(
ConsumerConfig.GROUP_ID_CONFIG,
"order-processing-group"
);
properties.put(
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName()
);
properties.put(
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName()
);
properties.put(
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,
"earliest"
);
properties.put(
ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
"org.apache.kafka.clients.consumer.CooperativeStickyAssignor"
);
properties.put(
ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG,
"3000"
);
properties.put(
ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG,
"45000"
);
properties.put(
ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG,
"300000"
);
KafkaConsumer consumer =
new KafkaConsumer<>(properties);
consumer.subscribe(
Collections.singletonList("customer-orders")
);
System.out.println("Consumer Group Started");
try {
while (true) {
ConsumerRecords records =
consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord record : records) {
System.out.println(
"Key: " + record.key()
);
System.out.println(
"Value: " + record.value()
);
System.out.println(
"Partition: " + record.partition()
);
System.out.println(
"Offset: " + record.offset()
);
System.out.println("-------------------");
}
}
} catch (Exception e) {
System.err.println(
"Consumer Error: " + e.getMessage()
);
} finally {
consumer.close();
System.out.println(
"Consumer Closed Successfully"
);
}
}
}
Offset Management During Rebalancing
Kafka stores consumer offsets in:
__consumer_offsets
When partitions move during rebalances:
- New consumer retrieves latest committed offset
- Processing resumes from that point
- Prevents message loss
Offset Commit Strategies
- Auto Commit
- Manual Commit
Critical systems usually prefer manual offset commits for stronger reliability guarantees.
Understanding Consumer Lag
Consumer Lag is the difference between:
- Latest broker offset
- Latest committed consumer offset
Lag indicates how far behind consumers are from real-time processing.
Causes of High Consumer Lag
- Slow databases
- Heavy processing logic
- Frequent rebalances
- Network bottlenecks
- Too few consumers
- Large message batches
Risks of High Lag
- Delayed notifications
- Stale analytics
- Fraud detection delays
- Backpressure
- System instability
Monitoring Consumer Groups
Important Metrics
- Consumer Lag
- Rebalance Frequency
- Heartbeat Failures
- Poll Latency
- Offset Commit Latency
Kafka CLI Commands
Describe Consumer Group
kafka-consumer-groups.sh \ --bootstrap-server localhost:9092 \ --describe \ --group order-processing-group
List Consumer Groups
kafka-consumer-groups.sh \ --bootstrap-server localhost:9092 \ --list
Reset Offsets
kafka-consumer-groups.sh \ --bootstrap-server localhost:9092 \ --group order-processing-group \ --reset-offsets \ --to-earliest \ --execute \ --topic customer-orders
Kafka Consumer Groups in Kubernetes Environments
Modern Kafka applications are commonly deployed inside Kubernetes clusters.
Each Kubernetes pod may run one Kafka consumer instance.
Kubernetes introduces new challenges:
- Frequent pod restarts
- Autoscaling spikes
- Node failures
- Rolling deployments
- Container startup delays
Without proper tuning, Kubernetes can create rebalance storms.
Recommended Strategies
- CooperativeStickyAssignor
- Static Membership
- Graceful shutdown hooks
- Lag-based autoscaling
- Readiness probes
Real-World Use Cases
1. E-Commerce Order Processing
An e-commerce platform processes millions of order events daily using consumer groups distributed across Kubernetes pods.
2. Banking Fraud Detection
Banks use multiple consumer groups for:
- Fraud detection
- Notifications
- Audit logging
- Analytics
3. Real-Time Analytics Platforms
Analytics systems use large consumer groups to process:
- User clicks
- Ad impressions
- Video watch events
- Telemetry streams
Production Best Practices
- Prefer Cooperative Sticky Rebalancing
- Use one consumer per thread
- Monitor consumer lag continuously
- Avoid long processing inside poll loop
- Use static membership
- Implement graceful shutdown handling
- Tune max.poll.records carefully
- Use manual commits for critical systems
- Avoid excessive partition counts
- Use DLQ strategies for failures
Common Mistakes to Avoid
1. Too Many Consumers
Extra consumers remain idle if partitions are insufficient.
2. Long Processing Times
Slow processing can exceed max.poll.interval.ms and trigger rebalances.
3. Frequent Restarts
Rapid pod restarts create rebalance storms.
4. Ignoring Consumer Lag
Lag growth indicates consumers cannot keep up.
5. Using Eager Rebalancing in Large Clusters
This can create severe latency spikes.
Frequently Asked Questions (FAQs)
Can multiple consumer groups read the same topic?
Yes. Each consumer group maintains independent offsets.
Can two consumers in the same group read the same partition?
No. Kafka assigns each partition to only one consumer within a group.
What happens if a consumer crashes?
Kafka detects heartbeat failures and redistributes partitions.
Why are rebalances expensive?
They temporarily pause processing and require reassignment coordination.
How do I reduce rebalance frequency?
Use cooperative rebalancing, stable consumers, and static membership.
What is consumer lag?
It represents how far consumers are behind the latest broker offsets.
Interview Questions and Answers
What is a Consumer Group?
A collection of consumers sharing the same group.id to process partitions in parallel.
What triggers rebalancing?
Consumer joins, crashes, heartbeat failures, partition changes, and poll interval violations.
Difference between session.timeout.ms and max.poll.interval.ms?
session.timeout.ms monitors heartbeats while max.poll.interval.ms monitors application processing delays.
Why is Cooperative Sticky Rebalancing better?
It minimizes partition movement and reduces downtime.
How does Kafka guarantee ordering?
Kafka guarantees ordering only within a single partition.
Where are consumer offsets stored?
Offsets are stored in the internal __consumer_offsets topic.
Summary
Kafka Consumer Groups are the foundation of scalable event-driven architectures in Apache Kafka.
They enable:
- Parallel processing
- Fault tolerance
- Automatic recovery
- Horizontal scalability
- Distributed stream processing
Rebalancing ensures healthy workload distribution, but poorly configured rebalances can introduce major latency and stability issues.
Modern Kafka systems should strongly prefer:
- Cooperative Sticky Rebalancing
- Static Membership
- Graceful shutdown handling
- Consumer lag monitoring
- Optimized heartbeat tuning
By mastering consumer groups, partition ownership, heartbeats, offsets, and rebalancing internals, you can build highly resilient Kafka architectures capable of processing millions of events reliably at scale.