Banking Use Case: Real-Time Transaction Processing

Requirements

50 million transactions/day
Peak TPS: 10,000+
No transaction loss
Account-level ordering
Disaster recovery
Regulatory audit compliance
Exactly-once settlement

High-Level Architecture

ATM
Mobile Banking
Internet Banking
UPI Gateway
Card Network
       |
       v
Transaction Service
       |
       v
Kafka Topic: bank-transactions
Partitions: 48
Replication Factor: 3
min.insync.replicas: 2
       |
       +------------------+
       |                  |
       v                  v
Fraud Detection     Settlement Engine
Consumer Group A    Consumer Group B

       |
       v
Database
       |
       v
Reporting / Audit / Analytics

Why 48 Partitions?

Before deciding partition count, understand Kafka partitions in detail: Working with Kafka Topics and Partitions .

Interview Answer

We estimated peak throughput at approximately 10,000 TPS. Based on consumer processing capacity and expected growth, we provisioned 48 partitions. Partition count was selected to support horizontal scaling while maintaining account-level ordering using account number as the message key.

Partitioning Strategy

ProducerRecord<String, TransactionEvent> record =
        new ProducerRecord<>(
                "bank-transactions",
                transaction.getAccountNumber(),
                transaction);

Kafka internally determines the target partition using the message key.

partition = hash(accountNumber) % 48

Ordering Guarantee Example

Account 10001

Tx1
Tx2
Tx3

Since the account number is always used as the Kafka key, all transactions for the same account are routed to the same partition. Kafka guarantees ordering within a partition, ensuring transactions are processed in the exact order they were produced.

Benefits of 48 Partitions

Supports horizontal scaling of consumers.
Provides sufficient throughput for 10,000+ TPS workloads.
Maintains account-level ordering.
Enables parallel processing across multiple consumers.
Allows future growth without immediate repartitioning.
Distributes workload evenly across Kafka brokers.

Architect-Level Explanation

For a banking workload processing approximately 50 million transactions per day with peak throughput exceeding 10,000 transactions per second, we selected 48 partitions to balance scalability and ordering requirements. Using the account number as the Kafka message key ensures all transactions for a specific account are routed to the same partition, preserving transaction order. The partition count also provides sufficient parallelism for multiple consumer groups such as settlement, fraud detection, audit, analytics, and notification services while maintaining fault tolerance and future scalability.

Enterprise Kafka Producer Design (Banking Use Case)

Producer

ProducerRecord<String, TransactionEvent> record =
        new ProducerRecord<>(
                "bank-transactions",
                transaction.getAccountNumber(),
                transaction);

Kafka Internally

partition = hash(accountNumber) % 48

Ordering Guarantee

Using the account number as the Kafka message key ensures that all transactions belonging to the same account are routed to the same partition. Since Kafka guarantees ordering within a partition, account-level ordering is preserved.

Account 10001

Tx1
Tx2
Tx3

All transactions are processed in the exact order they were produced.

Producer Configuration

For a deeper understanding of producer internals and message publishing, see: Understanding Kafka Producers: Sending Messages .

For banking-grade reliability:

acks=all
enable.idempotence=true
retries=Integer.MAX_VALUE
max.in.flight.requests.per.connection=5

Configuration Explanation

acks=all

The producer waits until all in-sync replicas acknowledge the write before considering the message successfully published.

Leader
Follower1
Follower2

All acknowledge

Producer receives success

This significantly reduces the risk of data loss if a broker fails immediately after accepting a write.

enable.idempotence=true

Idempotent producers prevent duplicate messages during retry scenarios. Kafka assigns a Producer ID and sequence number to ensure duplicate writes are ignored.

Interview Statement:

We enabled producer idempotency to prevent duplicate transaction events during network failures and retry scenarios.

retries=Integer.MAX_VALUE

Allows the producer to retry indefinitely for transient failures such as network issues or broker unavailability.

max.in.flight.requests.per.connection=5

Allows multiple requests to be sent concurrently while maintaining idempotency guarantees.

Topic Configuration

kafka-topics.sh \
--create \
--topic bank-transactions \
--partitions 48 \
--replication-factor 3

Broker Configuration

min.insync.replicas=2
unclean.leader.election.enable=false

Why?

min.insync.replicas=2

At least two replicas must acknowledge writes.

Leader + 1 Replica

This ensures durability even when a broker fails.

unclean.leader.election.enable=false

Prevents stale replicas from becoming leaders.

This is critical for banking systems where consistency is more important than availability.

Consumer Group Design

Consumer groups and partition assignments are explained in: Kafka Consumer Groups and Rebalancing .

Instead of a single consumer group:

settlement-group

Enterprise systems typically use multiple consumer groups:

fraud-group
settlement-group
audit-group
analytics-group
notification-group

Each consumer group receives a complete copy of the event stream and processes transactions independently.

Topic
 |
 +--> Fraud Team
 |
 +--> Settlement Team
 |
 +--> Audit Team
 |
 +--> Analytics Team
 |
 +--> Notification Team

Batch Consumer Design

Processing one message at a time can become expensive at high throughput.

Configuration

max.poll.records=1000

Batch Listener Example

@KafkaListener(
        topics = "bank-transactions",
        containerFactory = "batchFactory")
public void process(
        List<TransactionEvent> events) {

}

Why Batch Processing?

1000 Database Calls
vs
1 Batch Insert

Batch processing reduces database overhead and significantly improves throughput.

Exactly Once Processing

Senior Interview Question:

What happens if the database transaction succeeds but the Kafka offset commit fails?

Without Protection

DB Insert Success

Offset Commit Failed

Consumer Restarts

Same Message Reprocessed

This can result in duplicate settlement processing.

Solution

Transactional Outbox Pattern

Business Transaction
+
Outbox Record
=
Single Database Transaction

After commit, Debezium CDC publishes the outbox record to Kafka.

This is one of the most widely adopted patterns in modern banking architectures.

Retry Strategy

Never block Kafka consumer threads.

Bad Approach

while(true){
    retry();
}

This blocks partition consumption and increases consumer lag.

Recommended Approach

Main Topic
   |
   v
Retry Topic (5 min)
   |
   v
Retry Topic (30 min)
   |
   v
Dead Letter Topic (DLT)

Example Topics

bank-transactions
bank-transactions-retry-5m
bank-transactions-retry-30m
bank-transactions-dlt

This approach isolates problematic messages without impacting healthy traffic.

Rebalancing Problem

Senior Interview Question:

What happens during a Kafka consumer rebalance?

Problem Scenario

Consumer-1 crashes

Partitions reassigned

Processing temporarily pauses

Traditional rebalancing can cause significant disruption.

Solution

partition.assignment.strategy=
org.apache.kafka.clients.consumer.CooperativeStickyAssignor

Cooperative rebalancing minimizes partition movement and reduces application downtime.

Monitoring

Production banking systems require comprehensive monitoring.

Key Metrics

Consumer Lag
ISR Shrink Rate
Broker Disk Usage
Replication Latency
Under Replicated Partitions
DLT Count
Producer Error Rate

Observability Stack

Prometheus
Grafana
OpenTelemetry
Apache Kafka Metrics

Enterprise Producer Requirements

The producer must demonstrate the following capabilities:

Idempotent Producer
Avro Serialization
Schema Registry Integration
acks=all Durability
Partitioning by Account Number
Retry Handling
Correlation IDs
Distributed Traceability
High Throughput Configuration

Project Structure

src/main/java

com.banking.transaction.producer

├── config
│   └── KafkaProducerConfig.java
│
├── avro
│   └── TransactionEvent.avsc
│
├── dto
│   └── TransactionRequest.java
│
├── service
│   └── TransactionProducerService.java
│
├── controller
│   └── TransactionController.java
│
└── BankingProducerApplication.java

Option 5 Overview

Option 5 represents a complete enterprise banking architecture using:

Spring Boot 3
Apache Kafka
PostgreSQL
Avro Serialization
Schema Registry
Retry Topics
Dead Letter Topics
Transactional Outbox Pattern
Debezium CDC
Docker Compose
Prometheus Monitoring
Grafana Dashboards
OpenTelemetry Tracing

A complete implementation consists of dozens of classes and configuration files. Therefore, it is typically implemented incrementally.

Recommended Implementation Order

Complete Producer Module
Complete Consumer Module
Transactional Outbox Pattern
Debezium CDC Integration
Retry Topics and DLT
Schema Registry and Avro Evolution
Docker Compose Infrastructure
Observability and Monitoring
Disaster Recovery Strategy

Architect-Level Interview Answer

In our banking platform, Kafka served as the central event backbone for real-time transaction processing. We configured transaction topics with 48 partitions and replication factor 3 to support high throughput and fault tolerance. Account number was used as the partition key to preserve ordering for all transactions belonging to the same account. Producers were configured with idempotence enabled and acks=all to eliminate duplicates and ensure durability. Multiple consumer groups independently handled settlement, fraud detection, auditing, analytics, and notifications. Batch consumers processed up to 1000 records per poll to improve database efficiency. Retry topics and dead-letter topics isolated problematic events without blocking healthy traffic. We implemented the Transactional Outbox Pattern with Debezium CDC to eliminate dual-write problems and achieve reliable event delivery. Operational visibility was provided through Prometheus, Grafana, and OpenTelemetry. The architecture successfully supported more than 10,000 transactions per second while meeting regulatory, audit, disaster recovery, and exactly-once processing requirements expected in large-scale banking systems.

Enterprise Banking Kafka Producer Module

Producer Capabilities

Idempotent Producer
Avro Serialization
Schema Registry Integration
acks=all Durability
Partitioning by Account Number
Retry Handling
Correlation IDs
Distributed Traceability
High Throughput Configuration

Producer Module Structure

src/main/java

com.banking.transaction.producer

├── config
│   ├── KafkaProducerConfig.java
│
├── avro
│   └── TransactionEvent.avsc
│
├── dto
│   └── TransactionRequest.java
│
├── service
│   └── TransactionProducerService.java
│
├── controller
│   └── TransactionController.java
│
└── BankingProducerApplication.java

1. Avro Schema

TransactionEvent.avsc

{
  "type": "record",
  "name": "TransactionEvent",
  "namespace": "com.banking.avro",

  "fields": [
    {
      "name": "transactionId",
      "type": "string"
    },
    {
      "name": "accountNumber",
      "type": "string"
    },
    {
      "name": "amount",
      "type": "double"
    },
    {
      "name": "transactionType",
      "type": "string"
    },
    {
      "name": "createdAt",
      "type": "string"
    }
  ]
}

Generated Class

com.banking.avro.TransactionEvent

The Maven Avro plugin automatically generates strongly typed Java classes from the schema definition.

2. DTO

TransactionRequest.java

package com.banking.transaction.producer.dto;

import java.math.BigDecimal;

public class TransactionRequest {

    private String accountNumber;
    private BigDecimal amount;
    private String transactionType;

    public String getAccountNumber() {
        return accountNumber;
    }

    public void setAccountNumber(String accountNumber) {
        this.accountNumber = accountNumber;
    }

    public BigDecimal getAmount() {
        return amount;
    }

    public void setAmount(BigDecimal amount) {
        this.amount = amount;
    }

    public String getTransactionType() {
        return transactionType;
    }

    public void setTransactionType(String transactionType) {
        this.transactionType = transactionType;
    }
}

The DTO represents incoming transaction requests from REST clients.

3. Kafka Producer Configuration

KafkaProducerConfig.java

package com.banking.transaction.producer.config;

import io.confluent.kafka.serializers.KafkaAvroSerializer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.StringSerializer;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import java.util.HashMap;
import java.util.Map;

@Configuration
public class KafkaProducerConfig {

    @Bean
    public Map<String, Object> producerConfigs() {

        Map<String, Object> props =
                new HashMap<>();

        props.put(
                ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
                "localhost:9092");

        props.put(
                ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
                StringSerializer.class);

        props.put(
                ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
                KafkaAvroSerializer.class);

        props.put(
                "schema.registry.url",
                "http://localhost:8081");

        props.put(
                ProducerConfig.ACKS_CONFIG,
                "all");

        props.put(
                ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,
                true);

        props.put(
                ProducerConfig.RETRIES_CONFIG,
                Integer.MAX_VALUE);

        props.put(
                ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,
                5);

        props.put(
                ProducerConfig.COMPRESSION_TYPE_CONFIG,
                "snappy");

        props.put(
                ProducerConfig.BATCH_SIZE_CONFIG,
                65536);

        props.put(
                ProducerConfig.LINGER_MS_CONFIG,
                10);

        return props;
    }
}

Why These Producer Settings?

acks=all

Leader
Follower1
Follower2

All ACK

Producer Success

The producer waits until all in-sync replicas acknowledge the write before considering the transaction successful.

This minimizes the risk of data loss in the event of broker failures.

enable.idempotence=true

Ensures duplicate transaction events are not written during retry scenarios.

Retry Occurs

Same Transaction

Stored Only Once

compression.type=snappy

Reduces network traffic and broker storage utilization while maintaining good compression speed.

linger.ms=10

The producer waits for 10 milliseconds before sending records.

More Records
+
Larger Batch
=
Higher Throughput

This improves throughput by reducing network calls.

Producer Service

TransactionProducerService.java

package com.banking.transaction.producer.service;

import com.banking.avro.TransactionEvent;

import lombok.RequiredArgsConstructor;

import org.apache.kafka.clients.producer.ProducerRecord;

import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;

import java.time.Instant;
import java.util.UUID;

@Service
@RequiredArgsConstructor
public class TransactionProducerService {

    private final KafkaTemplate<String,
            TransactionEvent> kafkaTemplate;

    public void publishTransaction(
            String accountNumber,
            Double amount,
            String transactionType) {

        String transactionId =
                UUID.randomUUID().toString();

        TransactionEvent event =
                TransactionEvent.newBuilder()
                        .setTransactionId(transactionId)
                        .setAccountNumber(accountNumber)
                        .setAmount(amount)
                        .setTransactionType(transactionType)
                        .setCreatedAt(
                                Instant.now().toString())
                        .build();

        ProducerRecord<String,
                TransactionEvent> record =
                new ProducerRecord<>(
                        "bank-transactions",
                        accountNumber,
                        event);

        record.headers().add(
                "correlation-id",
                UUID.randomUUID()
                        .toString()
                        .getBytes());

        kafkaTemplate.send(record)
                .whenComplete(
                        (result, ex) -> {

                            if (ex != null) {

                                System.out.println(
                                        "FAILED : "
                                                + transactionId);
                            } else {

                                System.out.println(
                                        "SUCCESS : "
                                                + transactionId
                                                + " partition="
                                                + result
                                                .getRecordMetadata()
                                                .partition());
                            }
                        });
    }
}

Why Account Number as Kafka Key?

new ProducerRecord<>(
    "bank-transactions",
    accountNumber,
    event
)

Kafka uses the message key to determine the target partition.

hash(accountNumber) % partitions

Example

Account 10001

Tx1
Tx2
Tx3

All transactions for the same account are routed to the same partition, preserving transaction order.

Always Same Partition

Ordering Preserved

Correlation IDs and Traceability

Every event contains a correlation identifier in Kafka headers.

record.headers().add(
    "correlation-id",
    UUID.randomUUID()
        .toString()
        .getBytes());

Correlation IDs enable distributed tracing across multiple microservices.

Example flow:

Transaction Service
       |
       v
Fraud Service
       |
       v
Settlement Service
       |
       v
Notification Service

The same correlation ID can be tracked across all services.

REST Controller

TransactionController.java

package com.banking.transaction.producer.controller;

import com.banking.transaction.producer.dto.TransactionRequest;

import com.banking.transaction.producer.service.TransactionProducerService;

import lombok.RequiredArgsConstructor;

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/transactions")
@RequiredArgsConstructor
public class TransactionController {

    private final TransactionProducerService
            producerService;

    @PostMapping
    public ResponseEntity<String> publish(
            @RequestBody
            TransactionRequest request) {

        producerService.publishTransaction(
                request.getAccountNumber(),
                request.getAmount().doubleValue(),
                request.getTransactionType());

        return ResponseEntity.ok(
                "Transaction Published");
    }
}

Application Class

BankingProducerApplication.java

package com.banking.transaction.producer;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class BankingProducerApplication {

    public static void main(String[] args) {

        SpringApplication.run(
                BankingProducerApplication.class,
                args);
    }
}

application.yml

spring:

  kafka:

    producer:

      bootstrap-servers: localhost:9092

      properties:

        schema.registry.url:
          http://localhost:8081

      acks: all

  application:
    name: banking-producer

High Throughput Configuration Summary

Property	Purpose
acks=all	Maximum durability
enable.idempotence=true	Duplicate prevention
retries=Integer.MAX_VALUE	Automatic recovery
max.in.flight.requests.per.connection=5	Parallel requests
compression.type=snappy	Lower network usage
batch.size=65536	Larger producer batches
linger.ms=10	Batch optimization

Interview Explanation (15+ Years)

When asked to explain the Kafka producer design, a senior engineer should highlight durability, ordering, traceability, and scalability.

We configured the producer with acks=all and idempotence enabled to guarantee durability and avoid duplicates. Events were serialized using Avro and registered in Schema Registry to support schema evolution. Account number was used as the message key to ensure ordering for all transactions belonging to the same account. Correlation IDs were attached to Kafka headers for distributed tracing across microservices. Compression and batching parameters were tuned to maximize throughput while reducing network utilization. The producer publishes immutable transaction events to a partitioned Kafka topic with replication factor 3, enabling fault tolerance, horizontal scalability, and regulatory-grade reliability for banking workloads.

Enterprise Kafka Consumer Architecture for Banking Systems

In enterprise banking platforms, Kafka consumers are responsible for critical workloads such as settlement processing, fraud detection, auditing, analytics, and notifications. Unlike producers, consumers must handle duplicate messages, retry failures, dead-letter processing, manual offset management, and high-throughput database persistence. This article demonstrates a production-grade Kafka consumer implementation suitable for large-scale banking systems.

Why the Consumer is More Important Than the Producer

For senior-level interviews (15+ years experience), the consumer side is often considered more important because it is responsible for business processing and data consistency.

Batch Processing
Idempotency
Retry Topics
Dead Letter Topic (DLT)
Manual Offset Commit
PostgreSQL Persistence
High Throughput Processing
Fault Tolerance

Consumer Module Structure

com.banking.transaction.consumer

├── config
│   ├── KafkaConsumerConfig.java
│   └── KafkaErrorHandlerConfig.java
│
├── entity
│   ├── TransactionEntity.java
│   └── ProcessedEventEntity.java
│
├── repository
│   ├── TransactionRepository.java
│   └── ProcessedEventRepository.java
│
├── service
│   └── SettlementConsumerService.java
│
└── BankingConsumerApplication.java

Transaction Entity

TransactionEntity.java

@Entity
@Table(name = "transactions")
public class TransactionEntity {

    @Id
    private String transactionId;

    private String accountNumber;

    private BigDecimal amount;

    private String transactionType;

    private Instant createdAt;
}

The transaction table stores successfully processed settlement records.

Idempotency Table

Idempotency is one of the most important concepts in Kafka-based banking systems.

ProcessedEventEntity.java

@Entity
@Table(name = "processed_events")
public class ProcessedEventEntity {

    @Id
    private String eventId;
}

Why Is This Required?

Transaction TX1001 Processed

DB Commit Success

Offset Commit Failed

Consumer Restarted

Kafka Delivers TX1001 Again

Without idempotency protection, the same transaction may be processed multiple times, resulting in duplicate account credits or debits.

Account Credited Twice

Financial Inconsistency

Repository Layer

TransactionRepository.java

public interface TransactionRepository
        extends JpaRepository<
        TransactionEntity,
        String> {
}

ProcessedEventRepository.java

public interface ProcessedEventRepository
        extends JpaRepository<
        ProcessedEventEntity,
        String> {
}

Consumer Configuration

KafkaConsumerConfig.java

@Configuration
public class KafkaConsumerConfig {

    @Bean
    public ConcurrentKafkaListenerContainerFactory<
            String,
            TransactionEvent> batchFactory(
            ConsumerFactory consumerFactory) {

        ConcurrentKafkaListenerContainerFactory<
                String,
                TransactionEvent> factory =
                new ConcurrentKafkaListenerContainerFactory<>();

        factory.setConsumerFactory(
                consumerFactory);

        factory.setBatchListener(true);

        return factory;
    }
}

application.yml

spring:

  kafka:

    consumer:

      group-id: settlement-group

      enable-auto-commit: false

      auto-offset-reset: earliest

      max-poll-records: 1000

      properties:

        schema.registry.url:
          http://localhost:8081

        specific.avro.reader: true

    listener:

      ack-mode: manual

      type: batch

      concurrency: 6

Why Batch Processing?

Traditional Approach

1000 Records
=
1000 Database Inserts

Batch Approach

1000 Records
=
1 Batch Database Operation

Batch processing dramatically reduces database round trips and improves throughput for high-volume banking workloads.

Error Handling and Retry Topics

KafkaErrorHandlerConfig.java

@Configuration
public class KafkaErrorHandlerConfig {

    @Bean
    public DefaultErrorHandler errorHandler(
            KafkaTemplate template) {

        DeadLetterPublishingRecoverer recoverer =
                new DeadLetterPublishingRecoverer(
                        template,
                        (record, ex) ->
                                new TopicPartition(
                                        record.topic()
                                                + ".DLT",
                                        record.partition()));

        return new DefaultErrorHandler(
                recoverer,
                new FixedBackOff(
                        3000L,
                        3));
    }
}

Retry Flow

bank-transactions

      |
      v

Retry Attempt 1

      |
      v

Retry Attempt 2

      |
      v

Retry Attempt 3

      |
      v

bank-transactions.DLT

Messages that continue to fail after configured retries are routed to a Dead Letter Topic (DLT) for investigation and replay.

Settlement Consumer Service

SettlementConsumerService.java

@Service
@RequiredArgsConstructor
public class SettlementConsumerService {

    private final TransactionRepository
            transactionRepository;

    private final ProcessedEventRepository
            processedEventRepository;

    @Transactional
    @KafkaListener(
            topics = "bank-transactions",
            containerFactory = "batchFactory")
    public void processTransactions(
            List events,
            Acknowledgment acknowledgment) {

        // Process batch

        acknowledgment.acknowledge();
    }
}

Consumer Processing Flow

Kafka Topic
      |
      v
Batch Consumer
      |
      v
Idempotency Check
      |
      v
Database Transaction
      |
      v
Save Processed Events
      |
      v
Manual Offset Commit
      |
      v
Success

What Happens Internally?

Step 1

Kafka Delivers

TX1
TX2
TX3
TX4

Step 2

Check Idempotency Table

SELECT *
FROM processed_events
WHERE event_id='TX1';

If the event exists, processing is skipped.

Step 3

Batch Insert Transactions

INSERT INTO transactions
VALUES (...),
       (...),
       (...),
       (...);

Step 4

Save Processed Events

INSERT INTO processed_events
VALUES ('TX1'),
       ('TX2'),
       ('TX3');

Step 5

Manual Offset Commit

acknowledgment.acknowledge();

Offsets are committed only after successful database persistence.

Kafka Topic Configuration

kafka-topics.sh \
--create \
--topic bank-transactions \
--partitions 48 \
--replication-factor 3

Interview Questions and Answers

Why Disable Auto Commit?

Auto commit may acknowledge offsets before database persistence completes. If the application crashes after offset commit but before database commit, messages are permanently lost. Manual acknowledgment ensures offsets are committed only after successful processing.

Why Use an Idempotency Table?

Kafka guarantees at-least-once delivery. Duplicate message delivery is possible during failures and consumer restarts. The processed_events table prevents duplicate settlement operations.

Why Use Batch Consumers?

Batch consumers reduce database round trips, network overhead, and transaction costs, resulting in significantly higher throughput.

Why Use a Dead Letter Topic?

Poison messages should not block partition consumption. Failed messages are isolated into a DLT where they can be analyzed and replayed later.

Production Banking Enhancements

Schema Registry Integration
Avro Serialization
Distributed Tracing with Correlation IDs
Prometheus Metrics
Grafana Dashboards
OpenTelemetry Tracing
Retry Topics with Exponential Backoff
Consumer Lag Monitoring
Cooperative Rebalancing
Disaster Recovery Strategy

Architect-Level Summary

The consumer group processes transaction events in batches using manual offset management. Idempotency is enforced through a processed-events table, ensuring duplicate deliveries do not result in duplicate business processing. Database persistence occurs within a transaction, and Kafka offsets are acknowledged only after successful commit. Retry handling and dead-letter topics isolate problematic messages without affecting healthy traffic. This architecture provides the scalability, reliability, throughput, and fault tolerance required for modern banking systems processing millions of transactions per day.

The next enterprise-level pattern typically implemented after this design is the Transactional Outbox Pattern, which eliminates dual-write inconsistencies between PostgreSQL and Kafka.

Transactional Outbox Pattern (Banking-Grade)

For a 15+ years Architect/Principal Engineer interview, one of the most common Kafka questions is:

"How do you avoid the dual-write problem?"

The Problem

Suppose you do this:

accountRepository.save(account);

kafkaTemplate.send(
    "bank-transactions",
    transactionEvent);

Looks simple.

But what if:

DB Save = SUCCESS

Kafka Publish = FAILED

Now:

Money debited
No event published
Fraud system won't see it
Audit won't see it
Settlement won't see it
System becomes inconsistent

Another Failure Scenario

Kafka Publish = SUCCESS

DB Save = FAILED

Now Kafka contains an event that never happened.

Also bad.

Solution: Transactional Outbox

Instead of writing directly to Kafka:

Request
  |
  v

DB Transaction
  |
  +--> Account Table
  |
  +--> Outbox Table
  |
COMMIT

After Commit

Outbox Publisher
    |
    v

Kafka Topic

Architecture

To understand the underlying broker architecture supporting this design: Understanding Apache Kafka Architecture and Core Concepts .

Client
  |
  v

Transaction Service

  |
  +----------------------+
  | DB Transaction       |
  |                      |
  | Account Update       |
  | Outbox Insert        |
  +----------------------+

  |
 COMMIT

  |
  v

Outbox Publisher

  |
  v

Kafka

Project Structure

com.banking.transaction

├── entity
│   ├── AccountEntity.java
│   └── OutboxEventEntity.java
│
├── repository
│   ├── AccountRepository.java
│   └── OutboxRepository.java
│
├── service
│   ├── BankingTransactionService.java
│   └── OutboxPublisherService.java
│
└── scheduler
    └── OutboxScheduler.java

Account Entity

AccountEntity.java

package com.banking.transaction.entity;

import jakarta.persistence.Entity;
import jakarta.persistence.Id;
import jakarta.persistence.Table;

import java.math.BigDecimal;

@Entity
@Table(name = "accounts")
public class AccountEntity {

    @Id
    private String accountNumber;

    private BigDecimal balance;

    public String getAccountNumber() {
        return accountNumber;
    }

    public void setAccountNumber(
            String accountNumber) {
        this.accountNumber = accountNumber;
    }

    public BigDecimal getBalance() {
        return balance;
    }

    public void setBalance(
            BigDecimal balance) {
        this.balance = balance;
    }
}

Outbox Entity

OutboxEventEntity.java

package com.banking.transaction.entity;

import jakarta.persistence.*;

import java.time.Instant;

@Entity
@Table(name = "outbox_events")
public class OutboxEventEntity {

    @Id
    private String eventId;

    private String aggregateId;

    private String eventType;

    @Column(length = 10000)
    private String payload;

    private String status;

    private Instant createdAt;

    public String getEventId() {
        return eventId;
    }

    public void setEventId(
            String eventId) {
        this.eventId = eventId;
    }

    public String getAggregateId() {
        return aggregateId;
    }

    public void setAggregateId(
            String aggregateId) {
        this.aggregateId = aggregateId;
    }

    public String getEventType() {
        return eventType;
    }

    public void setEventType(
            String eventType) {
        this.eventType = eventType;
    }

    public String getPayload() {
        return payload;
    }

    public void setPayload(
            String payload) {
        this.payload = payload;
    }

    public String getStatus() {
        return status;
    }

    public void setStatus(
            String status) {
        this.status = status;
    }

    public Instant getCreatedAt() {
        return createdAt;
    }

    public void setCreatedAt(
            Instant createdAt) {
        this.createdAt = createdAt;
    }
}

Outbox Table

CREATE TABLE outbox_events
(
  event_id VARCHAR(100) PRIMARY KEY,

  aggregate_id VARCHAR(100),

  event_type VARCHAR(50),

  payload TEXT,

  status VARCHAR(20),

  created_at TIMESTAMP
);

Repository

OutboxRepository.java

package com.banking.transaction.repository;

import com.banking.transaction.entity
        .OutboxEventEntity;

import org.springframework.data.jpa.repository
        .JpaRepository;

import java.util.List;

public interface OutboxRepository
        extends JpaRepository<
        OutboxEventEntity,
        String> {

    List
    findTop100ByStatusOrderByCreatedAt(
            String status);
}

Business Service

BankingTransactionService.java

Critical Part

package com.banking.transaction.service;

import com.banking.transaction.entity.AccountEntity;
import com.banking.transaction.entity.OutboxEventEntity;

import com.banking.transaction.repository
        .AccountRepository;

import com.banking.transaction.repository
        .OutboxRepository;

import com.fasterxml.jackson.databind
        .ObjectMapper;

import org.springframework.stereotype.Service;

import org.springframework.transaction
        .annotation.Transactional;

import java.math.BigDecimal;
import java.time.Instant;
import java.util.UUID;

@Service
public class BankingTransactionService {

    private final AccountRepository
            accountRepository;

    private final OutboxRepository
            outboxRepository;

    private final ObjectMapper objectMapper;

    public BankingTransactionService(
            AccountRepository accountRepository,
            OutboxRepository outboxRepository,
            ObjectMapper objectMapper) {

        this.accountRepository =
                accountRepository;

        this.outboxRepository =
                outboxRepository;

        this.objectMapper =
                objectMapper;
    }

    @Transactional
    public void transferMoney(
            String accountNumber,
            BigDecimal amount)
            throws Exception {

        AccountEntity account =
                accountRepository
                        .findById(accountNumber)
                        .orElseThrow();

        account.setBalance(
                account.getBalance()
                        .subtract(amount));

        accountRepository.save(account);

        OutboxEventEntity outbox =
                new OutboxEventEntity();

        outbox.setEventId(
                UUID.randomUUID()
                        .toString());

        outbox.setAggregateId(
                accountNumber);

        outbox.setEventType(
                "MONEY_DEBITED");

        outbox.setPayload(
                objectMapper
                        .writeValueAsString(
                                account));

        outbox.setStatus(
                "NEW");

        outbox.setCreatedAt(
                Instant.now());

        outboxRepository.save(outbox);
    }
}

Why Is This Safe?

Everything happens inside:

@Transactional

Either:

Account Update SUCCESS
Outbox Insert SUCCESS

Both Rollback

The system never reaches an inconsistent state.

Outbox Publisher

OutboxPublisherService.java

package com.banking.transaction.service;

import com.banking.transaction.entity
        .OutboxEventEntity;

import com.banking.transaction.repository
        .OutboxRepository;

import lombok.RequiredArgsConstructor;

import org.springframework.kafka.core
        .KafkaTemplate;

import org.springframework.stereotype.Service;

@Service
@RequiredArgsConstructor
public class OutboxPublisherService {

    private final KafkaTemplate<
            String,
            String> kafkaTemplate;

    private final OutboxRepository
            outboxRepository;

    public void publish(
            OutboxEventEntity event) {

        kafkaTemplate.send(
                "bank-transactions",
                event.getAggregateId(),
                event.getPayload());

        event.setStatus(
                "PUBLISHED");

        outboxRepository.save(event);
    }
}

Scheduler

Poll unpublished events.

OutboxScheduler.java

package com.banking.transaction.scheduler;

import com.banking.transaction.entity
        .OutboxEventEntity;

import com.banking.transaction.repository
        .OutboxRepository;

import com.banking.transaction.service
        .OutboxPublisherService;

import lombok.RequiredArgsConstructor;

import org.springframework.scheduling
        .annotation.Scheduled;

import org.springframework.stereotype.Component;

import java.util.List;

@Component
@RequiredArgsConstructor
public class OutboxScheduler {

    private final OutboxRepository
            outboxRepository;

    private final OutboxPublisherService
            publisherService;

    @Scheduled(fixedDelay = 5000)
    public void publishEvents() {

        List events =
                outboxRepository
                        .findTop100ByStatusOrderByCreatedAt(
                                "NEW");

        for (OutboxEventEntity event
                : events) {

            publisherService.publish(
                    event);
        }
    }
}

Production Version

Senior-level systems typically replace polling with:

Debezium
Kafka Connect

Flow

DB Commit

      |
      v

Outbox Table

      |
      v

Debezium CDC

      |
      v

Kafka Topic

No scheduler required.

Interview Answer (15+ Years)

We used the Transactional Outbox Pattern to eliminate the dual-write problem between the database and Kafka. Business state changes and outbox records were persisted within the same database transaction. After commit, a publisher process (or Debezium CDC in production) published outbox events to Kafka. This guaranteed that every committed business transaction produced an event, and no event could exist without a corresponding database change. The pattern provided strong consistency, auditability, and reliable event delivery required in banking systems.

Architect-Level Comparison

Approach	Risk
DB Save → Kafka Send	Dual-write inconsistency
Kafka Send → DB Save	Phantom events
XA / 2PC	Complex, poor scalability
Transactional Outbox	Preferred for modern banking microservices

Key Takeaway

The Transactional Outbox Pattern is one of the most frequently discussed patterns in senior Kafka, microservices, event-driven architecture, and banking-system interviews because it solves the dual-write problem without introducing distributed transaction complexity.

Docker Compose for Enterprise Kafka Banking Platform

A senior interviewer may ask:

"How would you bring up a complete Kafka platform locally for development or lower environments?"

A Typical Enterprise Stack Includes

+--------------------------------------------------+
|                  Banking Platform                |
+--------------------------------------------------+

Spring Boot Producer
Spring Boot Consumer
PostgreSQL

Kafka Cluster
Schema Registry
Kafka UI

Prometheus
Grafana

(Optional)

Debezium
Kafka Connect

Folder Structure

banking-platform

├── docker-compose.yml

├── prometheus
│   └── prometheus.yml

├── grafana
│   └── dashboards

└── applications
    ├── producer
    └── consumer

docker-compose.yml

Kafka KRaft Mode (No Zookeeper)

version: '3.9'

services:

  postgres:

    image: postgres:16

    container_name: postgres

    environment:
      POSTGRES_DB: bankingdb
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres

    ports:
      - "5432:5432"

    volumes:
      - postgres-data:/var/lib/postgresql/data

  kafka:

    image: confluentinc/cp-kafka:7.7.0

    container_name: kafka

    hostname: kafka

    ports:
      - "9092:9092"

    environment:

      KAFKA_NODE_ID: 1

      KAFKA_PROCESS_ROLES: broker,controller

      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:29093

      KAFKA_LISTENERS: >
        PLAINTEXT://0.0.0.0:9092,
        CONTROLLER://0.0.0.0:29093

      KAFKA_ADVERTISED_LISTENERS: >
        PLAINTEXT://localhost:9092

      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: >
        PLAINTEXT:PLAINTEXT,
        CONTROLLER:PLAINTEXT

      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER

      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT

      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk

      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1

      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1

    volumes:
      - kafka-data:/var/lib/kafka/data

  schema-registry:

    image: confluentinc/cp-schema-registry:7.7.0

    container_name: schema-registry

    depends_on:
      - kafka

    ports:
      - "8081:8081"

    environment:

      SCHEMA_REGISTRY_HOST_NAME: schema-registry

      SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081

      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092

  kafka-ui:

    image: provectuslabs/kafka-ui:latest

    container_name: kafka-ui

    depends_on:
      - kafka

    ports:
      - "8080:8080"

    environment:

      KAFKA_CLUSTERS_0_NAME: banking-cluster

      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092

      KAFKA_CLUSTERS_0_SCHEMAREGISTRY: http://schema-registry:8081

  prometheus:

    image: prom/prometheus

    container_name: prometheus

    ports:
      - "9090:9090"

    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:

    image: grafana/grafana

    container_name: grafana

    ports:
      - "3000:3000"

    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin

volumes:

  postgres-data:

  kafka-data:

Prometheus Configuration

prometheus.yml

global:

  scrape_interval: 15s

scrape_configs:

  - job_name: 'producer'

    metrics_path: '/actuator/prometheus'

    static_configs:
      - targets:
          - host.docker.internal:8085

  - job_name: 'consumer'

    metrics_path: '/actuator/prometheus'

    static_configs:
      - targets:
          - host.docker.internal:8086

Create Kafka Topic

After Kafka starts:

docker exec -it kafka bash

Create Topic:

kafka-topics \
--create \
--topic bank-transactions \
--partitions 48 \
--replication-factor 1 \
--bootstrap-server localhost:9092

Verify Topic

kafka-topics \
--describe \
--topic bank-transactions \
--bootstrap-server localhost:9092

Example:

Topic: bank-transactions

Partitions: 48

Replication Factor: 1

Schema Registry Check

http://localhost:8081

You should see registered Avro schemas after producers publish events.

Kafka UI

http://localhost:8080

You can:

View Topics
View Partitions
View Consumer Groups
View Messages
Replay Messages

PostgreSQL Check

docker exec -it postgres psql -U postgres -d bankingdb

List tables:

\dt

Grafana

http://localhost:3000

Default Credentials:

username: admin
password: admin

Metrics to Monitor

Kafka Metrics

Consumer Lag
Messages/sec
Bytes In/sec
Bytes Out/sec
ISR Changes
Under Replicated Partitions

Application Metrics

Transaction Success Rate
Failed Transactions
DLT Count
Retry Count
DB Latency
Kafka Publish Latency

Production Banking Version

Kafka Broker 1
Kafka Broker 2
Kafka Broker 3

Topic:

Partitions = 48

Replication Factor = 3

min.insync.replicas = 2

This provides:

Broker Failure Tolerance
High Availability
No Data Loss

Architect-Level Interview Answer

For local development we use Docker Compose to provision PostgreSQL, Kafka in KRaft mode, Schema Registry, Kafka UI, Prometheus, and Grafana. Kafka topics are created with partitioning strategies aligned to expected throughput. Producers use Avro and Schema Registry for schema evolution. Metrics are exported through Spring Boot Actuator and scraped by Prometheus. Grafana dashboards provide visibility into consumer lag, throughput, retries, and DLT volume. In production, the same architecture scales to a multi-broker Kafka cluster with replication, ISR configuration, and centralized observability.

Complete Enterprise Banking Kafka Architecture (15+ Years Level)

This is the level expected for a Principal Engineer, Solution Architect, Staff Engineer, or Engineering Manager.

Business Requirement

A customer performs:

Mobile Banking Transfer

Account A → Account B

₹10,000

Requirements:

No money loss
No duplicate processing
Full audit trail
Fraud detection
Real-time notifications
Regulatory compliance
Disaster recovery

End-to-End Architecture

                ┌─────────────────────┐
                │  Mobile Banking App │
                └──────────┬──────────┘
                           │
                           ▼

                ┌─────────────────────┐
                │ API Gateway         │
                └──────────┬──────────┘
                           │
                           ▼

                ┌─────────────────────┐
                │ Transaction Service │
                └──────────┬──────────┘
                           │
             DB Transaction│
                           ▼

        ┌───────────────────────────────┐
        │ PostgreSQL                    │
        │ Accounts                      │
        │ Transactions                  │
        │ Outbox Events                 │
        └───────────────┬───────────────┘
                        │
                        ▼

                ┌─────────────────────┐
                │ Debezium CDC        │
                └──────────┬──────────┘
                           │
                           ▼

                ┌─────────────────────┐
                │ Apache Kafka        │
                │ bank-transactions   │
                └──────┬──────┬───────┘
                       │      │

           ┌───────────┘      └───────────┐

           ▼                              ▼

┌──────────────────┐         ┌──────────────────┐
│ Fraud Service    │         │ Settlement       │
│ Consumer Group   │         │ Consumer Group   │
└─────────┬────────┘         └─────────┬────────┘
          │                            │
          ▼                            ▼

 Fraud Database              Settlement Database

           ┌───────────────────────────────┐
           ▼                               ▼

 Notification Service         Audit Service

Topic Design

Main Topic

bank-transactions

Configuration:

Partitions            : 48
Replication Factor    : 3
Min ISR               : 2
Retention             : 30 Days
Cleanup Policy        : delete

Why 48 Partitions?

We estimated peak throughput of approximately 10k TPS. A 48-partition topic provided sufficient horizontal scalability while maintaining ordering guarantees per account.

Producer Design

acks=all
enable.idempotence=true
retries=Integer.MAX_VALUE
compression.type=snappy
linger.ms=10
batch.size=65536

Property	Purpose
acks=all	Durability
idempotence	No duplicates
retries	Network recovery
compression	Lower bandwidth
batching	Higher throughput

Partitioning Strategy

Producer Key:

accountNumber

Account 1001

Tx1
Tx2
Tx3

hash(accountNumber) % 48

All events go to the same partition. Ordering is preserved.

Schema Strategy

Learn more about Avro serialization and schema evolution: Working with Schema Registry and Avro .

Apache Avro
Confluent Schema Registry

Example v1

{
  "transactionId":"123",
  "amount":1000
}

Example v2

{
  "transactionId":"123",
  "amount":1000,
  "currency":"INR"
}

Compatibility Mode: BACKWARD

Consumers continue working after schema evolution.

Dual Write Problem Solution

Accounts Table
Transactions Table
Outbox Table

Single DB Transaction

Debezium CDC

→ Kafka

Consumer Design

Consumer Group:

settlement-group

enable-auto-commit: false
ack-mode: manual
max-poll-records: 1000
concurrency: 6

Batch Processing

Consumer receives:

1000 transactions

Instead of:

1000 inserts

Do:

INSERT INTO transactions
VALUES (...)

Batch insert provides significant performance gains.

Idempotency

Problem:

DB Success

Offset Commit Failure

Message Reprocessed

Without protection:

Account Credited Twice

Solution:

processed_events

event_id PRIMARY KEY

SELECT *
FROM processed_events
WHERE event_id=?

Retry Architecture

bank-transactions

        |
        ▼

retry-5m

        |
        ▼

retry-30m

        |
        ▼

DLT

Topics:

bank-transactions-retry-5m
bank-transactions-retry-30m
bank-transactions-dlt

DLT Replay

DLT

→ Main Topic

Controlled replay after issue resolution.

Exactly Once Semantics

Interview Question:

"How do you achieve exactly-once?"

Answer:

Kafka Alone

Exactly Once in Kafka

Not necessarily:

Kafka → Database

Required:

Idempotency
Outbox Pattern
Transactional Processing

Combined result: Effectively Exactly Once Processing.

Security

Authentication:
SASL_SSL

Authorization:
ACLs

Encryption:
TLS 1.3

Secrets:
Vault

Monitoring

Comprehensive Kafka monitoring strategies are covered in: Monitoring and Alerting in Apache Kafka Clusters .

Consumer Lag
DLT Count
Retry Count
Broker Health
ISR Shrink
Disk Usage
Publish Latency

Tools:

Prometheus
Grafana
OpenTelemetry

Disaster Recovery

Cross-cluster replication and DR strategies are discussed in: Multi-Cluster Kafka Deployments and MirrorMaker .

Primary:
Mumbai DC

Secondary:
Hyderabad DC

Replication:
MirrorMaker 2

Scaling Strategy

Current:

48 Partitions
6 Consumers

Peak Traffic:

48 Partitions
24 Consumers

Scale horizontally with no code changes.

Senior Architect Interview Answer

We built an event-driven banking platform using Kafka as the central event backbone. Transaction services persisted business state and outbox records within the same PostgreSQL transaction, eliminating dual-write inconsistencies. Debezium CDC streamed committed outbox events to Kafka topics configured with 48 partitions and replication factor 3. Producers used idempotence and acks=all, while consumers processed events in batches with manual offset management. Idempotency tables prevented duplicate settlement, and retry topics with DLT handling isolated poison messages. Avro and Schema Registry enabled schema evolution, while Prometheus and Grafana provided operational visibility. The architecture supported high throughput, fault tolerance, auditability, and regulatory compliance requirements expected in large-scale banking systems.

This is the complete architectural story you can walk through in a senior Kafka/banking interview—from API request all the way to settlement, fraud detection, monitoring, retries, and disaster recovery.

Frequently Asked Questions

Why use account number as Kafka key?

Using account number as the key guarantees ordering of transactions for the same account because Kafka preserves ordering within a partition.

Why not use database and Kafka in the same transaction?

Distributed transactions introduce complexity and scalability concerns. The Transactional Outbox Pattern provides a simpler and more reliable alternative.

Why is replication factor set to 3?

Replication factor 3 provides fault tolerance against broker failures while maintaining durability and availability.

How does Debezium solve the dual-write problem?

Debezium captures committed database changes and publishes them to Kafka, ensuring reliable event delivery after database commits.