Banking Use Case: Real-Time Transaction Processing
Requirements
- 50 million transactions/day
- Peak TPS: 10,000+
- No transaction loss
- Account-level ordering
- Disaster recovery
- Regulatory audit compliance
- Exactly-once settlement
High-Level Architecture
ATM
Mobile Banking
Internet Banking
UPI Gateway
Card Network
|
v
Transaction Service
|
v
Kafka Topic: bank-transactions
Partitions: 48
Replication Factor: 3
min.insync.replicas: 2
|
+------------------+
| |
v v
Fraud Detection Settlement Engine
Consumer Group A Consumer Group B
|
v
Database
|
v
Reporting / Audit / Analytics
Why 48 Partitions?
Before deciding partition count, understand Kafka partitions in detail: Working with Kafka Topics and Partitions .
Interview Answer
We estimated peak throughput at approximately 10,000 TPS. Based on consumer processing capacity and expected growth, we provisioned 48 partitions. Partition count was selected to support horizontal scaling while maintaining account-level ordering using account number as the message key.
Partitioning Strategy
ProducerRecord<String, TransactionEvent> record =
new ProducerRecord<>(
"bank-transactions",
transaction.getAccountNumber(),
transaction);
Kafka internally determines the target partition using the message key.
partition = hash(accountNumber) % 48
Ordering Guarantee Example
Account 10001 Tx1 Tx2 Tx3
Since the account number is always used as the Kafka key, all transactions for the same account are routed to the same partition. Kafka guarantees ordering within a partition, ensuring transactions are processed in the exact order they were produced.
Benefits of 48 Partitions
- Supports horizontal scaling of consumers.
- Provides sufficient throughput for 10,000+ TPS workloads.
- Maintains account-level ordering.
- Enables parallel processing across multiple consumers.
- Allows future growth without immediate repartitioning.
- Distributes workload evenly across Kafka brokers.
Architect-Level Explanation
For a banking workload processing approximately 50 million transactions per day with peak throughput exceeding 10,000 transactions per second, we selected 48 partitions to balance scalability and ordering requirements. Using the account number as the Kafka message key ensures all transactions for a specific account are routed to the same partition, preserving transaction order. The partition count also provides sufficient parallelism for multiple consumer groups such as settlement, fraud detection, audit, analytics, and notification services while maintaining fault tolerance and future scalability.
Enterprise Kafka Producer Design (Banking Use Case)
Producer
ProducerRecord<String, TransactionEvent> record =
new ProducerRecord<>(
"bank-transactions",
transaction.getAccountNumber(),
transaction);
Kafka Internally
partition = hash(accountNumber) % 48
Ordering Guarantee
Using the account number as the Kafka message key ensures that all transactions belonging to the same account are routed to the same partition. Since Kafka guarantees ordering within a partition, account-level ordering is preserved.
Account 10001 Tx1 Tx2 Tx3
All transactions are processed in the exact order they were produced.
Producer Configuration
For a deeper understanding of producer internals and message publishing, see: Understanding Kafka Producers: Sending Messages .
For banking-grade reliability:
acks=all enable.idempotence=true retries=Integer.MAX_VALUE max.in.flight.requests.per.connection=5
Configuration Explanation
acks=all
The producer waits until all in-sync replicas acknowledge the write before considering the message successfully published.
Leader Follower1 Follower2 All acknowledge Producer receives success
This significantly reduces the risk of data loss if a broker fails immediately after accepting a write.
enable.idempotence=true
Idempotent producers prevent duplicate messages during retry scenarios. Kafka assigns a Producer ID and sequence number to ensure duplicate writes are ignored.
Interview Statement:
We enabled producer idempotency to prevent duplicate transaction events during network failures and retry scenarios.
retries=Integer.MAX_VALUE
Allows the producer to retry indefinitely for transient failures such as network issues or broker unavailability.
max.in.flight.requests.per.connection=5
Allows multiple requests to be sent concurrently while maintaining idempotency guarantees.
Topic Configuration
kafka-topics.sh \ --create \ --topic bank-transactions \ --partitions 48 \ --replication-factor 3
Broker Configuration
min.insync.replicas=2 unclean.leader.election.enable=false
Why?
min.insync.replicas=2
At least two replicas must acknowledge writes.
Leader + 1 Replica
This ensures durability even when a broker fails.
unclean.leader.election.enable=false
Prevents stale replicas from becoming leaders.
This is critical for banking systems where consistency is more important than availability.
Consumer Group Design
Consumer groups and partition assignments are explained in: Kafka Consumer Groups and Rebalancing .
Instead of a single consumer group:
settlement-group
Enterprise systems typically use multiple consumer groups:
fraud-group settlement-group audit-group analytics-group notification-group
Each consumer group receives a complete copy of the event stream and processes transactions independently.
Topic | +--> Fraud Team | +--> Settlement Team | +--> Audit Team | +--> Analytics Team | +--> Notification Team
Batch Consumer Design
Processing one message at a time can become expensive at high throughput.
Configuration
max.poll.records=1000
Batch Listener Example
@KafkaListener(
topics = "bank-transactions",
containerFactory = "batchFactory")
public void process(
List<TransactionEvent> events) {
}
Why Batch Processing?
1000 Database Calls vs 1 Batch Insert
Batch processing reduces database overhead and significantly improves throughput.
Exactly Once Processing
Senior Interview Question:
What happens if the database transaction succeeds but the Kafka offset commit fails?
Without Protection
DB Insert Success Offset Commit Failed Consumer Restarts Same Message Reprocessed
This can result in duplicate settlement processing.
Solution
Transactional Outbox Pattern
Business Transaction + Outbox Record = Single Database Transaction
After commit, Debezium CDC publishes the outbox record to Kafka.
This is one of the most widely adopted patterns in modern banking architectures.
Retry Strategy
Never block Kafka consumer threads.
Bad Approach
while(true){
retry();
}
This blocks partition consumption and increases consumer lag.
Recommended Approach
Main Topic | v Retry Topic (5 min) | v Retry Topic (30 min) | v Dead Letter Topic (DLT)
Example Topics
bank-transactions bank-transactions-retry-5m bank-transactions-retry-30m bank-transactions-dlt
This approach isolates problematic messages without impacting healthy traffic.
Rebalancing Problem
Senior Interview Question:
What happens during a Kafka consumer rebalance?
Problem Scenario
Consumer-1 crashes Partitions reassigned Processing temporarily pauses
Traditional rebalancing can cause significant disruption.
Solution
partition.assignment.strategy= org.apache.kafka.clients.consumer.CooperativeStickyAssignor
Cooperative rebalancing minimizes partition movement and reduces application downtime.
Monitoring
Production banking systems require comprehensive monitoring.
Key Metrics
- Consumer Lag
- ISR Shrink Rate
- Broker Disk Usage
- Replication Latency
- Under Replicated Partitions
- DLT Count
- Producer Error Rate
Observability Stack
- Prometheus
- Grafana
- OpenTelemetry
- Apache Kafka Metrics
Enterprise Producer Requirements
The producer must demonstrate the following capabilities:
- Idempotent Producer
- Avro Serialization
- Schema Registry Integration
- acks=all Durability
- Partitioning by Account Number
- Retry Handling
- Correlation IDs
- Distributed Traceability
- High Throughput Configuration
Project Structure
src/main/java com.banking.transaction.producer βββ config β βββ KafkaProducerConfig.java β βββ avro β βββ TransactionEvent.avsc β βββ dto β βββ TransactionRequest.java β βββ service β βββ TransactionProducerService.java β βββ controller β βββ TransactionController.java β βββ BankingProducerApplication.java
Option 5 Overview
Option 5 represents a complete enterprise banking architecture using:
- Spring Boot 3
- Apache Kafka
- PostgreSQL
- Avro Serialization
- Schema Registry
- Retry Topics
- Dead Letter Topics
- Transactional Outbox Pattern
- Debezium CDC
- Docker Compose
- Prometheus Monitoring
- Grafana Dashboards
- OpenTelemetry Tracing
A complete implementation consists of dozens of classes and configuration files. Therefore, it is typically implemented incrementally.
Recommended Implementation Order
- Complete Producer Module
- Complete Consumer Module
- Transactional Outbox Pattern
- Debezium CDC Integration
- Retry Topics and DLT
- Schema Registry and Avro Evolution
- Docker Compose Infrastructure
- Observability and Monitoring
- Disaster Recovery Strategy
Architect-Level Interview Answer
In our banking platform, Kafka served as the central event backbone for real-time transaction processing. We configured transaction topics with 48 partitions and replication factor 3 to support high throughput and fault tolerance. Account number was used as the partition key to preserve ordering for all transactions belonging to the same account. Producers were configured with idempotence enabled and acks=all to eliminate duplicates and ensure durability. Multiple consumer groups independently handled settlement, fraud detection, auditing, analytics, and notifications. Batch consumers processed up to 1000 records per poll to improve database efficiency. Retry topics and dead-letter topics isolated problematic events without blocking healthy traffic. We implemented the Transactional Outbox Pattern with Debezium CDC to eliminate dual-write problems and achieve reliable event delivery. Operational visibility was provided through Prometheus, Grafana, and OpenTelemetry. The architecture successfully supported more than 10,000 transactions per second while meeting regulatory, audit, disaster recovery, and exactly-once processing requirements expected in large-scale banking systems.
Enterprise Banking Kafka Producer Module
Producer Capabilities
- Idempotent Producer
- Avro Serialization
- Schema Registry Integration
- acks=all Durability
- Partitioning by Account Number
- Retry Handling
- Correlation IDs
- Distributed Traceability
- High Throughput Configuration
Producer Module Structure
src/main/java com.banking.transaction.producer βββ config β βββ KafkaProducerConfig.java β βββ avro β βββ TransactionEvent.avsc β βββ dto β βββ TransactionRequest.java β βββ service β βββ TransactionProducerService.java β βββ controller β βββ TransactionController.java β βββ BankingProducerApplication.java
1. Avro Schema
TransactionEvent.avsc
{
"type": "record",
"name": "TransactionEvent",
"namespace": "com.banking.avro",
"fields": [
{
"name": "transactionId",
"type": "string"
},
{
"name": "accountNumber",
"type": "string"
},
{
"name": "amount",
"type": "double"
},
{
"name": "transactionType",
"type": "string"
},
{
"name": "createdAt",
"type": "string"
}
]
}
Generated Class
com.banking.avro.TransactionEvent
The Maven Avro plugin automatically generates strongly typed Java classes from the schema definition.
2. DTO
TransactionRequest.java
package com.banking.transaction.producer.dto;
import java.math.BigDecimal;
public class TransactionRequest {
private String accountNumber;
private BigDecimal amount;
private String transactionType;
public String getAccountNumber() {
return accountNumber;
}
public void setAccountNumber(String accountNumber) {
this.accountNumber = accountNumber;
}
public BigDecimal getAmount() {
return amount;
}
public void setAmount(BigDecimal amount) {
this.amount = amount;
}
public String getTransactionType() {
return transactionType;
}
public void setTransactionType(String transactionType) {
this.transactionType = transactionType;
}
}
The DTO represents incoming transaction requests from REST clients.
3. Kafka Producer Configuration
KafkaProducerConfig.java
package com.banking.transaction.producer.config;
import io.confluent.kafka.serializers.KafkaAvroSerializer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.StringSerializer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.HashMap;
import java.util.Map;
@Configuration
public class KafkaProducerConfig {
@Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props =
new HashMap<>();
props.put(
ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092");
props.put(
ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
StringSerializer.class);
props.put(
ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
KafkaAvroSerializer.class);
props.put(
"schema.registry.url",
"http://localhost:8081");
props.put(
ProducerConfig.ACKS_CONFIG,
"all");
props.put(
ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,
true);
props.put(
ProducerConfig.RETRIES_CONFIG,
Integer.MAX_VALUE);
props.put(
ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,
5);
props.put(
ProducerConfig.COMPRESSION_TYPE_CONFIG,
"snappy");
props.put(
ProducerConfig.BATCH_SIZE_CONFIG,
65536);
props.put(
ProducerConfig.LINGER_MS_CONFIG,
10);
return props;
}
}
Why These Producer Settings?
acks=all
Leader Follower1 Follower2 All ACK Producer Success
The producer waits until all in-sync replicas acknowledge the write before considering the transaction successful.
This minimizes the risk of data loss in the event of broker failures.
enable.idempotence=true
Ensures duplicate transaction events are not written during retry scenarios.
Retry Occurs Same Transaction Stored Only Once
compression.type=snappy
Reduces network traffic and broker storage utilization while maintaining good compression speed.
linger.ms=10
The producer waits for 10 milliseconds before sending records.
More Records + Larger Batch = Higher Throughput
This improves throughput by reducing network calls.
Producer Service
TransactionProducerService.java
package com.banking.transaction.producer.service;
import com.banking.avro.TransactionEvent;
import lombok.RequiredArgsConstructor;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;
import java.time.Instant;
import java.util.UUID;
@Service
@RequiredArgsConstructor
public class TransactionProducerService {
private final KafkaTemplate<String,
TransactionEvent> kafkaTemplate;
public void publishTransaction(
String accountNumber,
Double amount,
String transactionType) {
String transactionId =
UUID.randomUUID().toString();
TransactionEvent event =
TransactionEvent.newBuilder()
.setTransactionId(transactionId)
.setAccountNumber(accountNumber)
.setAmount(amount)
.setTransactionType(transactionType)
.setCreatedAt(
Instant.now().toString())
.build();
ProducerRecord<String,
TransactionEvent> record =
new ProducerRecord<>(
"bank-transactions",
accountNumber,
event);
record.headers().add(
"correlation-id",
UUID.randomUUID()
.toString()
.getBytes());
kafkaTemplate.send(record)
.whenComplete(
(result, ex) -> {
if (ex != null) {
System.out.println(
"FAILED : "
+ transactionId);
} else {
System.out.println(
"SUCCESS : "
+ transactionId
+ " partition="
+ result
.getRecordMetadata()
.partition());
}
});
}
}
Why Account Number as Kafka Key?
new ProducerRecord<>(
"bank-transactions",
accountNumber,
event
)
Kafka uses the message key to determine the target partition.
hash(accountNumber) % partitions
Example
Account 10001 Tx1 Tx2 Tx3
All transactions for the same account are routed to the same partition, preserving transaction order.
Always Same Partition Ordering Preserved
Correlation IDs and Traceability
Every event contains a correlation identifier in Kafka headers.
record.headers().add(
"correlation-id",
UUID.randomUUID()
.toString()
.getBytes());
Correlation IDs enable distributed tracing across multiple microservices.
Example flow:
Transaction Service
|
v
Fraud Service
|
v
Settlement Service
|
v
Notification Service
The same correlation ID can be tracked across all services.
REST Controller
TransactionController.java
package com.banking.transaction.producer.controller;
import com.banking.transaction.producer.dto.TransactionRequest;
import com.banking.transaction.producer.service.TransactionProducerService;
import lombok.RequiredArgsConstructor;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/transactions")
@RequiredArgsConstructor
public class TransactionController {
private final TransactionProducerService
producerService;
@PostMapping
public ResponseEntity<String> publish(
@RequestBody
TransactionRequest request) {
producerService.publishTransaction(
request.getAccountNumber(),
request.getAmount().doubleValue(),
request.getTransactionType());
return ResponseEntity.ok(
"Transaction Published");
}
}
Application Class
BankingProducerApplication.java
package com.banking.transaction.producer;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class BankingProducerApplication {
public static void main(String[] args) {
SpringApplication.run(
BankingProducerApplication.class,
args);
}
}
application.yml
spring:
kafka:
producer:
bootstrap-servers: localhost:9092
properties:
schema.registry.url:
http://localhost:8081
acks: all
application:
name: banking-producer
High Throughput Configuration Summary
| Property | Purpose |
|---|---|
| acks=all | Maximum durability |
| enable.idempotence=true | Duplicate prevention |
| retries=Integer.MAX_VALUE | Automatic recovery |
| max.in.flight.requests.per.connection=5 | Parallel requests |
| compression.type=snappy | Lower network usage |
| batch.size=65536 | Larger producer batches |
| linger.ms=10 | Batch optimization |
Interview Explanation (15+ Years)
When asked to explain the Kafka producer design, a senior engineer should highlight durability, ordering, traceability, and scalability.
We configured the producer with acks=all and idempotence enabled to guarantee durability and avoid duplicates. Events were serialized using Avro and registered in Schema Registry to support schema evolution. Account number was used as the message key to ensure ordering for all transactions belonging to the same account. Correlation IDs were attached to Kafka headers for distributed tracing across microservices. Compression and batching parameters were tuned to maximize throughput while reducing network utilization. The producer publishes immutable transaction events to a partitioned Kafka topic with replication factor 3, enabling fault tolerance, horizontal scalability, and regulatory-grade reliability for banking workloads.
Enterprise Kafka Consumer Architecture for Banking Systems
In enterprise banking platforms, Kafka consumers are responsible for critical workloads such as settlement processing, fraud detection, auditing, analytics, and notifications. Unlike producers, consumers must handle duplicate messages, retry failures, dead-letter processing, manual offset management, and high-throughput database persistence. This article demonstrates a production-grade Kafka consumer implementation suitable for large-scale banking systems.
Why the Consumer is More Important Than the Producer
For senior-level interviews (15+ years experience), the consumer side is often considered more important because it is responsible for business processing and data consistency.
- Batch Processing
- Idempotency
- Retry Topics
- Dead Letter Topic (DLT)
- Manual Offset Commit
- PostgreSQL Persistence
- High Throughput Processing
- Fault Tolerance
Consumer Module Structure
com.banking.transaction.consumer βββ config β βββ KafkaConsumerConfig.java β βββ KafkaErrorHandlerConfig.java β βββ entity β βββ TransactionEntity.java β βββ ProcessedEventEntity.java β βββ repository β βββ TransactionRepository.java β βββ ProcessedEventRepository.java β βββ service β βββ SettlementConsumerService.java β βββ BankingConsumerApplication.java
Transaction Entity
TransactionEntity.java
@Entity
@Table(name = "transactions")
public class TransactionEntity {
@Id
private String transactionId;
private String accountNumber;
private BigDecimal amount;
private String transactionType;
private Instant createdAt;
}
The transaction table stores successfully processed settlement records.
Idempotency Table
Idempotency is one of the most important concepts in Kafka-based banking systems.
ProcessedEventEntity.java
@Entity
@Table(name = "processed_events")
public class ProcessedEventEntity {
@Id
private String eventId;
}
Why Is This Required?
Transaction TX1001 Processed DB Commit Success Offset Commit Failed Consumer Restarted Kafka Delivers TX1001 Again
Without idempotency protection, the same transaction may be processed multiple times, resulting in duplicate account credits or debits.
Account Credited Twice Financial Inconsistency
Repository Layer
TransactionRepository.java
public interface TransactionRepository
extends JpaRepository<
TransactionEntity,
String> {
}
ProcessedEventRepository.java
public interface ProcessedEventRepository
extends JpaRepository<
ProcessedEventEntity,
String> {
}
Consumer Configuration
KafkaConsumerConfig.java
@Configuration
public class KafkaConsumerConfig {
@Bean
public ConcurrentKafkaListenerContainerFactory<
String,
TransactionEvent> batchFactory(
ConsumerFactory consumerFactory) {
ConcurrentKafkaListenerContainerFactory<
String,
TransactionEvent> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(
consumerFactory);
factory.setBatchListener(true);
return factory;
}
}
application.yml
spring:
kafka:
consumer:
group-id: settlement-group
enable-auto-commit: false
auto-offset-reset: earliest
max-poll-records: 1000
properties:
schema.registry.url:
http://localhost:8081
specific.avro.reader: true
listener:
ack-mode: manual
type: batch
concurrency: 6
Why Batch Processing?
Traditional Approach 1000 Records = 1000 Database Inserts
Batch Approach 1000 Records = 1 Batch Database Operation
Batch processing dramatically reduces database round trips and improves throughput for high-volume banking workloads.
Error Handling and Retry Topics
KafkaErrorHandlerConfig.java
@Configuration
public class KafkaErrorHandlerConfig {
@Bean
public DefaultErrorHandler errorHandler(
KafkaTemplate
Retry Flow
bank-transactions
|
v
Retry Attempt 1
|
v
Retry Attempt 2
|
v
Retry Attempt 3
|
v
bank-transactions.DLT
Messages that continue to fail after configured retries are routed to a Dead Letter Topic (DLT) for investigation and replay.
Settlement Consumer Service
SettlementConsumerService.java
@Service
@RequiredArgsConstructor
public class SettlementConsumerService {
private final TransactionRepository
transactionRepository;
private final ProcessedEventRepository
processedEventRepository;
@Transactional
@KafkaListener(
topics = "bank-transactions",
containerFactory = "batchFactory")
public void processTransactions(
List events,
Acknowledgment acknowledgment) {
// Process batch
acknowledgment.acknowledge();
}
}
Consumer Processing Flow
Kafka Topic
|
v
Batch Consumer
|
v
Idempotency Check
|
v
Database Transaction
|
v
Save Processed Events
|
v
Manual Offset Commit
|
v
Success
What Happens Internally?
Step 1
Kafka Delivers TX1 TX2 TX3 TX4
Step 2
Check Idempotency Table
SELECT * FROM processed_events WHERE event_id='TX1';
If the event exists, processing is skipped.
Step 3
Batch Insert Transactions
INSERT INTO transactions
VALUES (...),
(...),
(...),
(...);
Step 4
Save Processed Events
INSERT INTO processed_events
VALUES ('TX1'),
('TX2'),
('TX3');
Step 5
Manual Offset Commit
acknowledgment.acknowledge();
Offsets are committed only after successful database persistence.
Kafka Topic Configuration
kafka-topics.sh \ --create \ --topic bank-transactions \ --partitions 48 \ --replication-factor 3
Interview Questions and Answers
Why Disable Auto Commit?
Auto commit may acknowledge offsets before database persistence completes. If the application crashes after offset commit but before database commit, messages are permanently lost. Manual acknowledgment ensures offsets are committed only after successful processing.
Why Use an Idempotency Table?
Kafka guarantees at-least-once delivery. Duplicate message delivery is possible during failures and consumer restarts. The processed_events table prevents duplicate settlement operations.
Why Use Batch Consumers?
Batch consumers reduce database round trips, network overhead, and transaction costs, resulting in significantly higher throughput.
Why Use a Dead Letter Topic?
Poison messages should not block partition consumption. Failed messages are isolated into a DLT where they can be analyzed and replayed later.
Production Banking Enhancements
- Schema Registry Integration
- Avro Serialization
- Distributed Tracing with Correlation IDs
- Prometheus Metrics
- Grafana Dashboards
- OpenTelemetry Tracing
- Retry Topics with Exponential Backoff
- Consumer Lag Monitoring
- Cooperative Rebalancing
- Disaster Recovery Strategy
Architect-Level Summary
The consumer group processes transaction events in batches using manual offset management. Idempotency is enforced through a processed-events table, ensuring duplicate deliveries do not result in duplicate business processing. Database persistence occurs within a transaction, and Kafka offsets are acknowledged only after successful commit. Retry handling and dead-letter topics isolate problematic messages without affecting healthy traffic. This architecture provides the scalability, reliability, throughput, and fault tolerance required for modern banking systems processing millions of transactions per day.
The next enterprise-level pattern typically implemented after this design is the Transactional Outbox Pattern, which eliminates dual-write inconsistencies between PostgreSQL and Kafka.
Transactional Outbox Pattern (Banking-Grade)
For a 15+ years Architect/Principal Engineer interview, one of the most common Kafka questions is:
"How do you avoid the dual-write problem?"
The Problem
Suppose you do this:
accountRepository.save(account);
kafkaTemplate.send(
"bank-transactions",
transactionEvent);
Looks simple.
But what if:
DB Save = SUCCESS Kafka Publish = FAILED
Now:
- Money debited
- No event published
- Fraud system won't see it
- Audit won't see it
- Settlement won't see it
- System becomes inconsistent
Another Failure Scenario
Kafka Publish = SUCCESS DB Save = FAILED
Now Kafka contains an event that never happened.
Also bad.
Solution: Transactional Outbox
Instead of writing directly to Kafka:
Request
|
v
DB Transaction
|
+--> Account Table
|
+--> Outbox Table
|
COMMIT
After Commit
Outbox Publisher
|
v
Kafka Topic
Architecture
To understand the underlying broker architecture supporting this design: Understanding Apache Kafka Architecture and Core Concepts .
Client | v Transaction Service | +----------------------+ | DB Transaction | | | | Account Update | | Outbox Insert | +----------------------+ | COMMIT | v Outbox Publisher | v Kafka
Project Structure
com.banking.transaction
βββ entity
β βββ AccountEntity.java
β βββ OutboxEventEntity.java
β
βββ repository
β βββ AccountRepository.java
β βββ OutboxRepository.java
β
βββ service
β βββ BankingTransactionService.java
β βββ OutboxPublisherService.java
β
βββ scheduler
βββ OutboxScheduler.java
Account Entity
AccountEntity.java
package com.banking.transaction.entity;
import jakarta.persistence.Entity;
import jakarta.persistence.Id;
import jakarta.persistence.Table;
import java.math.BigDecimal;
@Entity
@Table(name = "accounts")
public class AccountEntity {
@Id
private String accountNumber;
private BigDecimal balance;
public String getAccountNumber() {
return accountNumber;
}
public void setAccountNumber(
String accountNumber) {
this.accountNumber = accountNumber;
}
public BigDecimal getBalance() {
return balance;
}
public void setBalance(
BigDecimal balance) {
this.balance = balance;
}
}
Outbox Entity
OutboxEventEntity.java
package com.banking.transaction.entity;
import jakarta.persistence.*;
import java.time.Instant;
@Entity
@Table(name = "outbox_events")
public class OutboxEventEntity {
@Id
private String eventId;
private String aggregateId;
private String eventType;
@Column(length = 10000)
private String payload;
private String status;
private Instant createdAt;
public String getEventId() {
return eventId;
}
public void setEventId(
String eventId) {
this.eventId = eventId;
}
public String getAggregateId() {
return aggregateId;
}
public void setAggregateId(
String aggregateId) {
this.aggregateId = aggregateId;
}
public String getEventType() {
return eventType;
}
public void setEventType(
String eventType) {
this.eventType = eventType;
}
public String getPayload() {
return payload;
}
public void setPayload(
String payload) {
this.payload = payload;
}
public String getStatus() {
return status;
}
public void setStatus(
String status) {
this.status = status;
}
public Instant getCreatedAt() {
return createdAt;
}
public void setCreatedAt(
Instant createdAt) {
this.createdAt = createdAt;
}
}
Outbox Table
CREATE TABLE outbox_events ( event_id VARCHAR(100) PRIMARY KEY, aggregate_id VARCHAR(100), event_type VARCHAR(50), payload TEXT, status VARCHAR(20), created_at TIMESTAMP );
Repository
OutboxRepository.java
package com.banking.transaction.repository;
import com.banking.transaction.entity
.OutboxEventEntity;
import org.springframework.data.jpa.repository
.JpaRepository;
import java.util.List;
public interface OutboxRepository
extends JpaRepository<
OutboxEventEntity,
String> {
List
findTop100ByStatusOrderByCreatedAt(
String status);
}
Business Service
BankingTransactionService.java
Critical Part
package com.banking.transaction.service;
import com.banking.transaction.entity.AccountEntity;
import com.banking.transaction.entity.OutboxEventEntity;
import com.banking.transaction.repository
.AccountRepository;
import com.banking.transaction.repository
.OutboxRepository;
import com.fasterxml.jackson.databind
.ObjectMapper;
import org.springframework.stereotype.Service;
import org.springframework.transaction
.annotation.Transactional;
import java.math.BigDecimal;
import java.time.Instant;
import java.util.UUID;
@Service
public class BankingTransactionService {
private final AccountRepository
accountRepository;
private final OutboxRepository
outboxRepository;
private final ObjectMapper objectMapper;
public BankingTransactionService(
AccountRepository accountRepository,
OutboxRepository outboxRepository,
ObjectMapper objectMapper) {
this.accountRepository =
accountRepository;
this.outboxRepository =
outboxRepository;
this.objectMapper =
objectMapper;
}
@Transactional
public void transferMoney(
String accountNumber,
BigDecimal amount)
throws Exception {
AccountEntity account =
accountRepository
.findById(accountNumber)
.orElseThrow();
account.setBalance(
account.getBalance()
.subtract(amount));
accountRepository.save(account);
OutboxEventEntity outbox =
new OutboxEventEntity();
outbox.setEventId(
UUID.randomUUID()
.toString());
outbox.setAggregateId(
accountNumber);
outbox.setEventType(
"MONEY_DEBITED");
outbox.setPayload(
objectMapper
.writeValueAsString(
account));
outbox.setStatus(
"NEW");
outbox.setCreatedAt(
Instant.now());
outboxRepository.save(outbox);
}
}
Why Is This Safe?
Everything happens inside:
@Transactional
Either:
Account Update SUCCESS Outbox Insert SUCCESS
OR
Both Rollback
The system never reaches an inconsistent state.
Outbox Publisher
OutboxPublisherService.java
package com.banking.transaction.service;
import com.banking.transaction.entity
.OutboxEventEntity;
import com.banking.transaction.repository
.OutboxRepository;
import lombok.RequiredArgsConstructor;
import org.springframework.kafka.core
.KafkaTemplate;
import org.springframework.stereotype.Service;
@Service
@RequiredArgsConstructor
public class OutboxPublisherService {
private final KafkaTemplate<
String,
String> kafkaTemplate;
private final OutboxRepository
outboxRepository;
public void publish(
OutboxEventEntity event) {
kafkaTemplate.send(
"bank-transactions",
event.getAggregateId(),
event.getPayload());
event.setStatus(
"PUBLISHED");
outboxRepository.save(event);
}
}
Scheduler
Poll unpublished events.
OutboxScheduler.java
package com.banking.transaction.scheduler;
import com.banking.transaction.entity
.OutboxEventEntity;
import com.banking.transaction.repository
.OutboxRepository;
import com.banking.transaction.service
.OutboxPublisherService;
import lombok.RequiredArgsConstructor;
import org.springframework.scheduling
.annotation.Scheduled;
import org.springframework.stereotype.Component;
import java.util.List;
@Component
@RequiredArgsConstructor
public class OutboxScheduler {
private final OutboxRepository
outboxRepository;
private final OutboxPublisherService
publisherService;
@Scheduled(fixedDelay = 5000)
public void publishEvents() {
List events =
outboxRepository
.findTop100ByStatusOrderByCreatedAt(
"NEW");
for (OutboxEventEntity event
: events) {
publisherService.publish(
event);
}
}
}
Production Version
Senior-level systems typically replace polling with:
- Debezium
- Kafka Connect
Flow
DB Commit
|
v
Outbox Table
|
v
Debezium CDC
|
v
Kafka Topic
No scheduler required.
Interview Answer (15+ Years)
We used the Transactional Outbox Pattern to eliminate the dual-write problem between the database and Kafka. Business state changes and outbox records were persisted within the same database transaction. After commit, a publisher process (or Debezium CDC in production) published outbox events to Kafka. This guaranteed that every committed business transaction produced an event, and no event could exist without a corresponding database change. The pattern provided strong consistency, auditability, and reliable event delivery required in banking systems.
Architect-Level Comparison
| Approach | Risk |
|---|---|
| DB Save β Kafka Send | Dual-write inconsistency |
| Kafka Send β DB Save | Phantom events |
| XA / 2PC | Complex, poor scalability |
| Transactional Outbox | Preferred for modern banking microservices |
Key Takeaway
The Transactional Outbox Pattern is one of the most frequently discussed patterns in senior Kafka, microservices, event-driven architecture, and banking-system interviews because it solves the dual-write problem without introducing distributed transaction complexity.
Docker Compose for Enterprise Kafka Banking Platform
A senior interviewer may ask:
"How would you bring up a complete Kafka platform locally for development or lower environments?"
A Typical Enterprise Stack Includes
+--------------------------------------------------+ | Banking Platform | +--------------------------------------------------+ Spring Boot Producer Spring Boot Consumer PostgreSQL Kafka Cluster Schema Registry Kafka UI Prometheus Grafana (Optional) Debezium Kafka Connect
Folder Structure
banking-platform
βββ docker-compose.yml
βββ prometheus
β βββ prometheus.yml
βββ grafana
β βββ dashboards
βββ applications
βββ producer
βββ consumer
docker-compose.yml
Kafka KRaft Mode (No Zookeeper)
version: '3.9'
services:
postgres:
image: postgres:16
container_name: postgres
environment:
POSTGRES_DB: bankingdb
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
kafka:
image: confluentinc/cp-kafka:7.7.0
container_name: kafka
hostname: kafka
ports:
- "9092:9092"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:29093
KAFKA_LISTENERS: >
PLAINTEXT://0.0.0.0:9092,
CONTROLLER://0.0.0.0:29093
KAFKA_ADVERTISED_LISTENERS: >
PLAINTEXT://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: >
PLAINTEXT:PLAINTEXT,
CONTROLLER:PLAINTEXT
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
volumes:
- kafka-data:/var/lib/kafka/data
schema-registry:
image: confluentinc/cp-schema-registry:7.7.0
container_name: schema-registry
depends_on:
- kafka
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092
kafka-ui:
image: provectuslabs/kafka-ui:latest
container_name: kafka-ui
depends_on:
- kafka
ports:
- "8080:8080"
environment:
KAFKA_CLUSTERS_0_NAME: banking-cluster
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092
KAFKA_CLUSTERS_0_SCHEMAREGISTRY: http://schema-registry:8081
prometheus:
image: prom/prometheus
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana
container_name: grafana
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
volumes:
postgres-data:
kafka-data:
Prometheus Configuration
prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'producer'
metrics_path: '/actuator/prometheus'
static_configs:
- targets:
- host.docker.internal:8085
- job_name: 'consumer'
metrics_path: '/actuator/prometheus'
static_configs:
- targets:
- host.docker.internal:8086
Create Kafka Topic
After Kafka starts:
docker exec -it kafka bash
Create Topic:
kafka-topics \ --create \ --topic bank-transactions \ --partitions 48 \ --replication-factor 1 \ --bootstrap-server localhost:9092
Verify Topic
kafka-topics \ --describe \ --topic bank-transactions \ --bootstrap-server localhost:9092
Example:
Topic: bank-transactions Partitions: 48 Replication Factor: 1
Schema Registry Check
http://localhost:8081
You should see registered Avro schemas after producers publish events.
Kafka UI
http://localhost:8080
You can:
- View Topics
- View Partitions
- View Consumer Groups
- View Messages
- Replay Messages
PostgreSQL Check
docker exec -it postgres psql -U postgres -d bankingdb
List tables:
\dt
Grafana
http://localhost:3000
Default Credentials:
username: admin password: admin
Metrics to Monitor
Kafka Metrics
- Consumer Lag
- Messages/sec
- Bytes In/sec
- Bytes Out/sec
- ISR Changes
- Under Replicated Partitions
Application Metrics
- Transaction Success Rate
- Failed Transactions
- DLT Count
- Retry Count
- DB Latency
- Kafka Publish Latency
Production Banking Version
Kafka Broker 1 Kafka Broker 2 Kafka Broker 3 Topic: Partitions = 48 Replication Factor = 3 min.insync.replicas = 2
This provides:
- Broker Failure Tolerance
- High Availability
- No Data Loss
Architect-Level Interview Answer
For local development we use Docker Compose to provision PostgreSQL, Kafka in KRaft mode, Schema Registry, Kafka UI, Prometheus, and Grafana. Kafka topics are created with partitioning strategies aligned to expected throughput. Producers use Avro and Schema Registry for schema evolution. Metrics are exported through Spring Boot Actuator and scraped by Prometheus. Grafana dashboards provide visibility into consumer lag, throughput, retries, and DLT volume. In production, the same architecture scales to a multi-broker Kafka cluster with replication, ISR configuration, and centralized observability.
Complete Enterprise Banking Kafka Architecture (15+ Years Level)
This is the level expected for a Principal Engineer, Solution Architect, Staff Engineer, or Engineering Manager.
Business Requirement
A customer performs:
Mobile Banking Transfer Account A β Account B βΉ10,000
Requirements:
- No money loss
- No duplicate processing
- Full audit trail
- Fraud detection
- Real-time notifications
- Regulatory compliance
- Disaster recovery
End-to-End Architecture
βββββββββββββββββββββββ
β Mobile Banking App β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β API Gateway β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Transaction Service β
ββββββββββββ¬βββββββββββ
β
DB Transactionβ
βΌ
βββββββββββββββββββββββββββββββββ
β PostgreSQL β
β Accounts β
β Transactions β
β Outbox Events β
βββββββββββββββββ¬ββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Debezium CDC β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Apache Kafka β
β bank-transactions β
ββββββββ¬βββββββ¬ββββββββ
β β
βββββββββββββ βββββββββββββ
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β Fraud Service β β Settlement β
β Consumer Group β β Consumer Group β
βββββββββββ¬βββββββββ βββββββββββ¬βββββββββ
β β
βΌ βΌ
Fraud Database Settlement Database
βββββββββββββββββββββββββββββββββ
βΌ βΌ
Notification Service Audit Service
Topic Design
Main Topic
bank-transactions
Configuration:
Partitions : 48 Replication Factor : 3 Min ISR : 2 Retention : 30 Days Cleanup Policy : delete
Why 48 Partitions?
We estimated peak throughput of approximately 10k TPS. A 48-partition topic provided sufficient horizontal scalability while maintaining ordering guarantees per account.
Producer Design
acks=all enable.idempotence=true retries=Integer.MAX_VALUE compression.type=snappy linger.ms=10 batch.size=65536
| Property | Purpose |
|---|---|
| acks=all | Durability |
| idempotence | No duplicates |
| retries | Network recovery |
| compression | Lower bandwidth |
| batching | Higher throughput |
Partitioning Strategy
Producer Key: accountNumber
Account 1001 Tx1 Tx2 Tx3
hash(accountNumber) % 48
All events go to the same partition. Ordering is preserved.
Schema Strategy
Learn more about Avro serialization and schema evolution: Working with Schema Registry and Avro .
Apache Avro Confluent Schema Registry
Example v1
{
"transactionId":"123",
"amount":1000
}
Example v2
{
"transactionId":"123",
"amount":1000,
"currency":"INR"
}
Compatibility Mode: BACKWARD
Consumers continue working after schema evolution.
Dual Write Problem Solution
Accounts Table Transactions Table Outbox Table Single DB Transaction Debezium CDC β Kafka
Consumer Design
Consumer Group: settlement-group
enable-auto-commit: false ack-mode: manual max-poll-records: 1000 concurrency: 6
Batch Processing
Consumer receives: 1000 transactions
Instead of:
1000 inserts
Do:
INSERT INTO transactions VALUES (...)
Batch insert provides significant performance gains.
Idempotency
Problem:
DB Success Offset Commit Failure Message Reprocessed
Without protection:
Account Credited Twice
Solution:
processed_events event_id PRIMARY KEY
SELECT * FROM processed_events WHERE event_id=?
Retry Architecture
bank-transactions
|
βΌ
retry-5m
|
βΌ
retry-30m
|
βΌ
DLT
Topics:
bank-transactions-retry-5m bank-transactions-retry-30m bank-transactions-dlt
DLT Replay
DLT β Main Topic
Controlled replay after issue resolution.
Exactly Once Semantics
Interview Question:
"How do you achieve exactly-once?"
Answer:
Kafka Alone Exactly Once in Kafka
Not necessarily:
Kafka β Database
Required:
- Idempotency
- Outbox Pattern
- Transactional Processing
Combined result: Effectively Exactly Once Processing.
Security
Authentication: SASL_SSL Authorization: ACLs Encryption: TLS 1.3 Secrets: Vault
Monitoring
Comprehensive Kafka monitoring strategies are covered in: Monitoring and Alerting in Apache Kafka Clusters .
- Consumer Lag
- DLT Count
- Retry Count
- Broker Health
- ISR Shrink
- Disk Usage
- Publish Latency
Tools:
Prometheus Grafana OpenTelemetry
Disaster Recovery
Cross-cluster replication and DR strategies are discussed in: Multi-Cluster Kafka Deployments and MirrorMaker .
Primary: Mumbai DC Secondary: Hyderabad DC Replication: MirrorMaker 2
Scaling Strategy
Current: 48 Partitions 6 Consumers Peak Traffic: 48 Partitions 24 Consumers
Scale horizontally with no code changes.
Senior Architect Interview Answer
We built an event-driven banking platform using Kafka as the central event backbone. Transaction services persisted business state and outbox records within the same PostgreSQL transaction, eliminating dual-write inconsistencies. Debezium CDC streamed committed outbox events to Kafka topics configured with 48 partitions and replication factor 3. Producers used idempotence and acks=all, while consumers processed events in batches with manual offset management. Idempotency tables prevented duplicate settlement, and retry topics with DLT handling isolated poison messages. Avro and Schema Registry enabled schema evolution, while Prometheus and Grafana provided operational visibility. The architecture supported high throughput, fault tolerance, auditability, and regulatory compliance requirements expected in large-scale banking systems.
This is the complete architectural story you can walk through in a senior Kafka/banking interviewβfrom API request all the way to settlement, fraud detection, monitoring, retries, and disaster recovery.
Frequently Asked Questions
Why use account number as Kafka key?
Using account number as the key guarantees ordering of transactions for the same account because Kafka preserves ordering within a partition.
Why not use database and Kafka in the same transaction?
Distributed transactions introduce complexity and scalability concerns. The Transactional Outbox Pattern provides a simpler and more reliable alternative.
Why is replication factor set to 3?
Replication factor 3 provides fault tolerance against broker failures while maintaining durability and availability.
How does Debezium solve the dual-write problem?
Debezium captures committed database changes and publishes them to Kafka, ensuring reliable event delivery after database commits.