Multi-Cluster Kafka Deployments and MirrorMaker
As enterprise applications grow, a single Apache Kafka cluster may no longer be sufficient. Organizations often need to span their infrastructure across multiple geographical regions, data centers, or cloud providers. Managing data flow across these distinct environments requires a robust multi-cluster strategy. This guide explores multi-cluster Kafka deployments and details how to use Kafka MirrorMaker 2 to replicate data seamlessly between them.
Why Deploy Multiple Kafka Clusters?
Deploying a single, massive Kafka cluster across different geographical regions (often called a stretched cluster) introduces high network latency and risks split-brain scenarios if network partitions occur. Instead, running independent clusters in different regions and replicating data between them is the industry-standard approach. Here are the primary drivers for multi-cluster setups:
- Disaster Recovery (DR): If an entire cloud region or physical data center goes offline, a secondary cluster in another region can take over, ensuring business continuity.
- Data Localization and Compliance: Regulatory requirements (such as GDPR) may require user data to remain within specific geographic boundaries while allowing aggregated, non-sensitive data to be replicated globally.
- Geographic Proximity (Low Latency): Placing Kafka clusters closer to end-users reduces write and read latency. Users in Europe interact with a European cluster, while users in Asia interact with an Asian cluster.
- Cloud Migration and Hybrid Cloud: Organizations often need to replicate data between on-premises data centers and public cloud environments during migrations or for hybrid operational models.
Multi-Cluster Deployment Topologies
When designing a multi-cluster Kafka architecture, you can choose from several replication topologies depending on your business requirements.
1. Active-Passive (Active-Standby)
In this topology, one cluster (Active) handles all production reads and writes. MirrorMaker replicates data continuously to a secondary cluster (Passive). The secondary cluster remains idle or is used only for read-only analytical workloads until a failover event occurs.
[ Producers ]
โ
โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Active โ โ Passive โ
โ Cluster โ โโโโโโโโโโโโโโโ> โ Cluster โ
โ (Region A) โ MirrorMaker โ (Region B) โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ
โผ
[ Consumers ]
(Only on Failover)
2. Active-Active
In an Active-Active topology, both clusters actively accept writes and reads from local applications. MirrorMaker replicates data bidirectionally between the clusters. This allows users in different regions to experience low-latency operations while still sharing a global view of the data.
[ Producers A ] [ Producers B ]
โ โ
โผ โผ
โโโโโโโโโโโโโโโโ MirrorMaker โโโโโโโโโโโโโโโโ
โ Cluster A โ <โโโโโโโโโโโโโโ> โ Cluster B โ
โ (Region A) โ Bidirectional โ (Region B) โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ โ
โผ โผ
[ Consumers A ] [ Consumers B ]
3. Hub-and-Spoke (Aggregation)
This topology is common when multiple local regional clusters (Spokes) collect data locally and replicate it to a central, massive data warehouse cluster (Hub) for global analytics, reporting, or long-term storage.
โโโโโโโโโโโโโโโโ
โ Spoke A โ โโโโ
โโโโโโโโโโโโโโโโ โ
โ
โโโโโโโโโโโโโโโโ โผ โโโโโโโโโโโโโโโโ
โ Spoke B โ โโโโโโโ> โ Hub โ
โโโโโโโโโโโโโโโโ โฒ โ Cluster โ
โ โโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโ โ
โ Spoke C โ โโโโ
โโโโโโโโโโโโโโโโ
What is Kafka MirrorMaker?
Kafka MirrorMaker is the official tool shipped with Apache Kafka for replicating data between clusters. It acts as a bridge, consuming messages from a source cluster and producing them to a target cluster.
MirrorMaker 1 vs. MirrorMaker 2 (MM2)
The original MirrorMaker (MM1) was a simple consumer-producer tool. It suffered from significant limitations, such as failing to replicate topic configurations, failing to sync consumer offsets, and being prone to message duplication or loss during network rebalances.
To address these limitations, Kafka 2.4 introduced MirrorMaker 2 (MM2). MM2 is built on top of the Kafka Connect framework, making it highly scalable, resilient, and declarative. Key features of MirrorMaker 2 include:
- Dynamic Topic Creation: Automatically creates topics in the target cluster with the same partition count and configurations as the source cluster.
- Offset Syncing: Translates and syncs consumer group offsets between clusters, allowing consumers to failover without restarting from the beginning of the topic.
- Cycle Detection: Prevents infinite loops in Active-Active setups by prefixing replicated topic names with the source cluster's identifier (e.g.,
us-east.orders). - Heartbeats and Monitoring: Emits heartbeat and checkpoint metrics to verify replication health and measure replication latency.
Configuring MirrorMaker 2
MirrorMaker 2 can be run in dedicated mode using a bundled script, or as a connector inside an existing Kafka Connect cluster. Below is a practical configuration example for running MirrorMaker 2 in dedicated mode.
Example: Active-Passive Configuration
Save the following configuration as mm2.properties. In this scenario, we replicate data from a primary source cluster (primary) to a backup target cluster (backup).
# Define cluster aliases clusters = primary, backup # Connection details for the primary cluster primary.bootstrap.servers = primary-broker1:9092,primary-broker2:9092 # Connection details for the backup cluster backup.bootstrap.servers = backup-broker1:9092,backup-broker2:9092 # Define the replication flow (primary to backup is enabled) primary->backup.enabled = true primary->backup.topics = orders.*, customers # Disable replication from backup to primary (Active-Passive) backup->primary.enabled = false # Configuration for replication parameters primary->backup.sync.topic.configs = true primary->backup.sync.topic.acls = true primary->backup.emit.heartbeats = true primary->backup.emit.checkpoints = true # Consumer offset synchronization settings primary->backup.sync.group.offsets.enabled = true primary->backup.sync.group.offsets.interval.seconds = 10 # Internal topic replication factors checkpoints.internal.replication.factor = 3 heartbeats.internal.replication.factor = 3 offset-syncs.internal.replication.factor = 3
To start MirrorMaker 2 with this configuration, execute the following command in your terminal:
bin/connect-mirror-maker.sh mm2.properties
Understanding MM2 Internal Topics
When MirrorMaker 2 starts, it automatically creates several internal topics in the target cluster to manage state and track replication progress:
heartbeats: Emitted periodically by MM2 to verify that the replication link between the source and target clusters is active and healthy.checkpoints: Tracks the consumer group offsets for replicated topics. This topic maps the offset of a message in the source cluster to its corresponding offset in the target cluster, which is essential for seamless consumer failover.offset-syncs: Stores mapping information between the source and target topic partition offsets, enabling precise translation during replication.
Real-World Use Cases
1. Disaster Recovery Failover
A financial institution runs its primary payment gateway in an on-premises data center. To prepare for natural disasters or power outages, they run a secondary cluster in a public cloud region. MirrorMaker 2 continuously replicates payment transactions. If the on-premises data center goes dark, the application's DNS is updated to point to the cloud Kafka cluster. Because MM2 translates consumer offsets, payment processing applications resume exactly where they left off without processing duplicate transactions.
2. Edge-to-Cloud Aggregation
A global logistics company operates local Kafka clusters on smart devices and gateway servers inside regional warehouses (the Spokes). These warehouses track inventory movements in real-time. MirrorMaker 2 replicates this localized data to a central cloud-based Kafka cluster (the Hub). The central cluster is used by data scientists to run global supply-chain optimization models and business intelligence dashboards.
Common Mistakes and How to Avoid Them
- Infinite Replication Loops: In Active-Active setups, if topic renaming is disabled, Cluster A will replicate a message to Cluster B, and Cluster B will see it as a new message and replicate it back to Cluster A. Always keep the default topic prefixing enabled (e.g.,
source.topic-name) or configure custom replication filters to prevent infinite loops. - Ignoring Network Costs: Replicating massive volumes of data across cloud regions or out of on-premises environments can incur significant egress network costs. Use message compression (like Zstandard or Snappy) on your producers and configure MirrorMaker to compress payloads to minimize bandwidth consumption.
- Neglecting ACL and Schema Registry Syncing: MirrorMaker 2 can replicate topics and their configurations, but it does not automatically replicate schemas stored in an external Schema Registry or complex security ACLs. Ensure you run a synchronized Schema Registry deployment and automate ACL deployments across both clusters using infrastructure-as-code tools.
- Inadequate Monitoring of Replication Lag: If MirrorMaker falls behind, your target cluster will have stale data. Always monitor the consumer lag of the MirrorMaker consumer groups using tools like Prometheus and Grafana.
Interview Notes
Question: What is the difference between MirrorMaker 1 and MirrorMaker 2?
Answer: MirrorMaker 1 was a basic consumer-producer command-line tool that did not support dynamic topic creation, offset translation, or partition alignment, leading to data loss and manual operational overhead. MirrorMaker 2 is built on the Kafka Connect framework, making it highly scalable. It automatically syncs topic configurations, manages consumer offsets across clusters, detects replication loops, and provides detailed monitoring metrics.
Question: How does MirrorMaker 2 prevent infinite loops in an Active-Active deployment?
Answer: By default, MirrorMaker 2 uses a naming strategy that prepends the source cluster's alias to the replicated topic name (e.g., us-east.orders). MM2 detects these prefixes and automatically stops replicating a topic back to a cluster that already matches the prefix, preventing circular replication.
Question: Is replication via MirrorMaker synchronous or asynchronous?
Answer: MirrorMaker replication is entirely asynchronous. Producers write to the source cluster first, and MirrorMaker reads those messages as a consumer before writing them to the target cluster. This means there is always a non-zero replication lag between the clusters.
Summary
Multi-cluster Kafka deployments are essential for building resilient, low-latency, and compliant data architectures. While managing multiple clusters introduces complexity, Apache Kafka MirrorMaker 2 provides a robust, Connect-based solution for cross-cluster replication. By understanding topologies like Active-Passive and Hub-and-Spoke, properly configuring MM2 properties, and monitoring replication lag, you can design a highly reliable global data pipeline that withstands regional failures and optimizes data access for global users.