Working with Kafka Topics and Partitions
Last Updated: May 28, 2026
Learn how Apache Kafka topics and partitions work internally, how Kafka stores data, how producers distribute messages, how partitions enable scalability, and how to manage Kafka topics using real-world architecture concepts and Kafka CLI commands.
If you are new to Kafka fundamentals, first read:
- Introduction to Event Streaming and Apache Kafka
- Understanding Apache Kafka Architecture and Core Concepts
Table of Contents
- Understanding Kafka Topics
- Understanding Kafka Partitions
- Understanding Kafka Offsets
- Why Kafka Uses Partitions
- How Producers Route Messages
- Sticky Partitioner Explained
- Replication and Fault Tolerance
- Partitions and Consumer Groups
- Kafka Topic CLI Commands
- Partition Scaling Strategy
- Real-World Use Cases
- Common Mistakes to Avoid
- Partition Design Best Practices
- Interview Questions
- Frequently Asked Questions
- Summary
Understanding Kafka Topics
A Kafka Topic is a logical category or stream where producers publish events and consumers subscribe to read those events.
A topic acts as a continuously growing append-only event log.
Examples of Kafka topics:
customer-orderspayment-eventsinventory-updatesemail-notificationsuser-login-events
Topics decouple producers from consumers.
This means producers do not need to know:
- Who consumes the data
- How many consumers exist
- When consumers process messages
Similarly, consumers do not need direct communication with producers.
This decoupling is one of the most important architectural advantages of Kafka in event-driven systems.
A single Kafka topic can have:
- Multiple producers
- Multiple consumer groups
- Millions of messages
- Continuous real-time event streams
Understanding Kafka Partitions
Kafka topics are divided into smaller units called partitions.
Partitions are the actual storage mechanism Kafka uses internally.
Instead of storing all topic data inside one massive file, Kafka distributes the topic into multiple partitions for better scalability and parallel processing.
Each partition is:
- Immutable
- Append-only
- Sequentially ordered
- Stored independently
Messages are always appended at the end of the partition log.
Topic: customer-orders +------------------------------------------------------+ | Partition 0 | | Offset 0 -> Offset 1 -> Offset 2 -> Offset 3 | +------------------------------------------------------+ +------------------------------------------------------+ | Partition 1 | | Offset 0 -> Offset 1 -> Offset 2 | +------------------------------------------------------+ +------------------------------------------------------+ | Partition 2 | | Offset 0 -> Offset 1 -> Offset 2 -> Offset 3 | +------------------------------------------------------+
Each partition can reside on a different Kafka broker.
This allows Kafka clusters to scale horizontally across multiple servers.
Kafka guarantees strict ordering only inside an individual partition.
Ordering is not guaranteed across multiple partitions.
If ordering is important for your business workflow, related messages must always go to the same partition using a partition key.
Understanding Kafka Offsets
Every message inside a partition receives a unique sequential identifier called an offset.
Offsets are extremely important because they allow Kafka consumers to track message processing progress.
Consumers use offsets to:
- Resume processing after crashes
- Replay historical messages
- Track processed events
- Handle failures safely
- Implement retry mechanisms
Partition 0 Offset 0 -> OrderCreated Offset 1 -> PaymentProcessed Offset 2 -> OrderPacked Offset 3 -> OrderShipped
Offsets are unique only within a partition.
Offset 5 in Partition 0 is different from Offset 5 in Partition 1.
Kafka stores offsets separately from the actual messages.
Consumer groups commit offsets periodically to Kafka's internal offset topic.
Why Kafka Uses Partitions
Partitions are the foundation of Kafka scalability and performance.
1. Horizontal Scalability
A single server has limitations in:
- Disk storage
- CPU power
- Network bandwidth
- Read/write throughput
Partitions allow Kafka to distribute data across multiple brokers.
Partition 0 -> Broker A Partition 1 -> Broker B Partition 2 -> Broker C
This architecture enables Kafka clusters to handle:
- Millions of messages per second
- Petabytes of storage
- Large-scale distributed processing
2. Parallel Processing
Partitions allow multiple consumers to process messages simultaneously.
This is one of the most powerful features of Kafka consumer groups.
Topic Partitions = 4 Consumer Instances = 4 Consumer 1 -> Partition 0 Consumer 2 -> Partition 1 Consumer 3 -> Partition 2 Consumer 4 -> Partition 3
Each consumer processes a different partition independently.
This improves:
- Processing speed
- Throughput
- Scalability
- Concurrency
3. Fault Tolerance
Partitions are replicated across multiple Kafka brokers.
If a broker crashes, Kafka automatically promotes another replica as leader.
This ensures:
- High availability
- Minimal downtime
- Data durability
- Automatic recovery
How Producers Route Messages
When a producer sends a message to Kafka, Kafka must determine which partition stores that message.
This process is controlled by the Kafka partitioner.
With a Partition Key
If a producer sends a key with the message:
customer_id = 1001
Kafka calculates:
hash(key) % number_of_partitions
This guarantees:
- Same key always maps to same partition
- Ordering remains consistent
- Related events stay together
Example:
OrderCreated PaymentProcessed OrderShipped
All events for the same order remain in sequence.
Without a Partition Key
If no key is provided, Kafka uses the Sticky Partitioner.
The sticky partitioner temporarily selects one partition and sends batches of records there until the batch is full.
Then it switches to another partition.
Benefits:
- Better batching efficiency
- Lower network overhead
- Higher throughput
- Reduced latency
Sticky Partitioner Explained
Earlier Kafka versions used round-robin partitioning for messages without keys.
Modern Kafka producers use sticky partitioning because it improves batching performance dramatically.
Instead of constantly switching partitions for every message, sticky partitioning temporarily "sticks" to one partition.
This increases batch sizes and improves compression efficiency.
This optimization significantly improves producer throughput in high-volume systems.
Replication and Fault Tolerance
Each partition can have multiple replicas distributed across brokers.
Partition 0 Leader Replica -> Broker 1 Follower Replica -> Broker 2 Follower Replica -> Broker 3
The leader handles:
- Read requests
- Write requests
- Producer acknowledgments
Followers continuously replicate the leader's data.
If the leader broker fails:
- Kafka elects a new leader
- Consumers reconnect automatically
- Producers continue sending data
This failover process happens automatically.
Partitions and Consumer Groups
Consumer groups allow Kafka consumers to process data collaboratively.
Each partition inside a consumer group can only be consumed by one consumer instance at a time.
Example:
Topic Partitions = 3 Consumer Instances = 5
Result:
- 3 consumers become active
- 2 consumers remain idle
Maximum consumer parallelism equals the number of partitions.
This is why partition planning is extremely important in Kafka architecture design.
Kafka Topic CLI Commands
The following examples assume Kafka is running locally on port 9092.
Create a Topic
kafka-topics.sh \ --create \ --bootstrap-server localhost:9092 \ --replication-factor 1 \ --partitions 3 \ --topic ecommerce-transactions
Describe a Topic
kafka-topics.sh \ --describe \ --bootstrap-server localhost:9092 \ --topic ecommerce-transactions
This displays:
- Partition count
- Leader brokers
- Replica brokers
- ISR replicas
- Topic configuration
List Topics
kafka-topics.sh \ --list \ --bootstrap-server localhost:9092
Increase Partitions
kafka-topics.sh \ --alter \ --bootstrap-server localhost:9092 \ --topic ecommerce-transactions \ --partitions 6
Important: Increasing partition count changes key hashing distribution.
This can affect ordering guarantees.
Delete a Topic
kafka-topics.sh \ --delete \ --bootstrap-server localhost:9092 \ --topic ecommerce-transactions
Partition Scaling Strategy
Choosing the correct partition count is one of the most important Kafka architecture decisions.
Too few partitions create bottlenecks.
Too many partitions increase:
- Metadata overhead
- Memory consumption
- Controller load
- Broker recovery time
Partition sizing should consider:
- Expected throughput
- Future traffic growth
- Consumer scaling needs
- Storage requirements
- Replication factor
Real-World Use Cases
E-Commerce Order Processing
Using order_id as partition key guarantees ordering of order lifecycle events.
This ensures:
- Payments happen before shipping
- Order status remains consistent
- Consumers process events sequentially
Banking Systems
Bank transactions require strict ordering.
Using account_id as the partition key ensures transaction consistency.
Log Aggregation Platforms
Logs often prioritize throughput over strict ordering.
Messages are distributed evenly across partitions for maximum performance.
IoT Sensor Processing
Millions of sensor events are distributed across partitions for scalable ingestion and analytics processing.
Common Mistakes to Avoid
Over-Partitioning
Too many partitions increase:
- Open file handles
- Broker memory usage
- Metadata synchronization overhead
- Leader election time
Under-Partitioning
Too few partitions limit:
- Parallel processing
- Consumer scalability
- System throughput
Increasing Partitions on Keyed Topics
Changing partition count changes hash distribution.
Messages with the same key may later go to different partitions.
This can break ordering guarantees.
Using One Partition for Large Systems
Single-partition topics severely limit throughput and scalability.
Partition Design Best Practices
Use Stable Partition Keys
Good partition keys:
customer_idaccount_idorder_id
Plan for Future Growth
Always estimate:
- Future traffic
- Consumer scaling
- Storage growth
- Peak throughput
Monitor Partition Distribution
Uneven traffic distribution creates hot partitions.
Monitor:
- Broker CPU
- Disk usage
- Partition traffic
- Consumer lag
Avoid Frequent Partition Changes
Partition changes affect message routing consistency.
Design topics properly from the beginning.
Interview Questions
Does Kafka guarantee ordering across partitions?
No. Kafka guarantees ordering only within a single partition.
What determines maximum consumer parallelism?
Maximum parallelism equals the number of partitions.
What is ISR in Kafka?
ISR stands for In-Sync Replicas.
These replicas remain fully synchronized with the leader.
What happens when a broker fails?
Kafka automatically elects a new leader from ISR replicas.
Why are partitions important?
Partitions enable:
- Scalability
- Parallelism
- Replication
- Fault tolerance
- High throughput
Frequently Asked Questions
Can Kafka partitions be decreased?
No. Kafka supports increasing partitions but not safely decreasing them.
How many partitions should a topic have?
It depends on:
- Throughput requirements
- Consumer scaling
- Future growth
- Broker capacity
Can multiple consumers read the same partition?
Within the same consumer group: Only one consumer reads a partition.
Across different consumer groups: Multiple groups can read independently.
Why are offsets important?
Offsets allow:
- Recovery
- Replay
- Fault tolerance
- Consumer tracking
Next Step
Now continue learning Kafka producers and message publishing internals:
Understanding Kafka Producers and Sending Messages
Continue Learning Apache Kafka
- Kafka CLI Tools and Essential Commands
- Understanding Kafka Producers and Sending Messages
- Understanding Kafka Consumers and Reading Messages
- Kafka Security, Authentication, Authorization, and Encryption
Summary
Kafka topics and partitions are the foundation of scalable event-driven architecture.
Partitions enable:
- Horizontal scalability
- Parallel processing
- High throughput
- Fault tolerance
- Distributed storage
Understanding partition design is critical for building reliable enterprise Kafka systems.
Always carefully plan:
- Partition count
- Partition keys
- Replication factor
- Consumer scaling
- Ordering requirements
In the next lesson, you will learn how Kafka producers internally batch, compress, acknowledge, and publish events efficiently to Kafka brokers.