Published: 2026-06-01 • Updated: 2026-06-20

Implementing Retrieval-Augmented Generation (RAG) with Spring AI

Retrieval-Augmented Generation, commonly called RAG, is one of the most important architectures for building reliable AI applications. A normal chat model answers using the knowledge it learned during training. But enterprise applications usually need answers from private, updated, domain-specific data such as documents, FAQs, policies, product catalogs, interview questions, course content, tickets, or database records.

RAG solves this problem by retrieving relevant information from your own knowledge base and giving that context to the AI model before generating the final answer.

Spring AI provides support for RAG flows using VectorStore, embeddings, ChatClient, and Advisor APIs. The Spring AI documentation explains that RAG helps overcome LLM limitations around long-form content, factual accuracy, and context awareness, and it provides Advisor-based support such as QuestionAnswerAdvisor for common RAG workflows. ([docs.spring.io](https://docs.spring.io/spring-ai/reference/api/retrieval-augmented-generation.html))


What is RAG?

RAG stands for Retrieval-Augmented Generation.

It combines two steps:

  • Retrieval: Find relevant information from your own documents or database
  • Generation: Send the retrieved context to an AI model and generate an answer

Simple RAG Flow

User Question
      |
      v
Search Knowledge Base
      |
      v
Retrieve Relevant Context
      |
      v
Send Context + Question to LLM
      |
      v
Generate Grounded Answer

Why RAG is Needed?

Without RAG, an AI model may:

  • Give outdated answers
  • Guess information
  • Hallucinate facts
  • Fail to answer company-specific questions
  • Miss private business knowledge
  • Provide generic responses

With RAG, the model receives real context from your documents before answering.


Normal LLM vs RAG-Based LLM

Normal LLM RAG-Based LLM
Uses model training knowledge Uses retrieved enterprise knowledge
May hallucinate More grounded in provided context
May be outdated Can use updated documents
Generic answers Domain-specific answers
No source tracking Can track source documents

Real-Time Learning Platform Example

Suppose a learning platform has course content about:

  • Java
  • Spring Boot
  • Microservices
  • Docker
  • Kubernetes
  • Spring AI
  • Agentic AI

User asks:

Which course should I learn to build scalable backend systems?

A RAG system can search existing course content, retrieve relevant lessons, and generate a personalized answer using platform-specific content.


Real-Time Banking Example

A banking AI assistant may answer questions using verified banking documents.

User:
Amount was debited but UPI transaction failed. When will it be reversed?

RAG flow:

  1. Search failed UPI transaction policy
  2. Retrieve reversal timeline document
  3. Send policy context to chat model
  4. Generate grounded response

The model should not guess. It should answer only from retrieved banking policy.


Real-Time E-Commerce Example

An e-commerce AI assistant may answer refund and delivery questions.

User:
Can I return a damaged mobile phone after delivery?

RAG retrieves:

  • Return policy
  • Damaged product policy
  • Refund processing timeline
  • Replacement rules

The AI then generates a clear customer-friendly answer.


Core Components of RAG

Component Purpose
Documents Knowledge source
Chunking Splits large documents into smaller parts
Embedding Model Converts text into vectors
Vector Store Stores and searches embeddings
Retriever Finds relevant chunks
Chat Model Generates answer using retrieved context

Spring AI RAG Architecture

Documents
   |
   v
Text Extraction
   |
   v
Chunking
   |
   v
EmbeddingModel
   |
   v
VectorStore
   |
   v
Similarity Search
   |
   v
ChatClient
   |
   v
Grounded Answer

Spring AI Building Blocks for RAG

Spring AI provides important abstractions for RAG:

  • Document
  • EmbeddingModel
  • VectorStore
  • ChatClient
  • QuestionAnswerAdvisor
  • VectorStoreRetriever

Spring AI also provides VectorStoreRetriever, a read-only view of a vector store that exposes similarity search functionality. This is useful in RAG applications where the application only needs retrieval access and should not modify vector data. ([docs.spring.io](https://docs.spring.io/spring-ai/reference/api/vectordbs.html))


Step 1: Choose a Vector Database

RAG needs a vector store to save and search embeddings.

Common options:

  • PGVector
  • Pinecone
  • MongoDB Atlas Vector Search
  • Redis Vector Search
  • Qdrant
  • Milvus
  • Weaviate
  • Elasticsearch
  • OpenSearch

When to Use PGVector?

  • You already use PostgreSQL
  • Your dataset is small to medium
  • You want simple local development
  • You prefer SQL-based infrastructure

When to Use Pinecone or Cloud Vector DB?

  • You need managed vector infrastructure
  • You expect large-scale vector search
  • You want easier scaling
  • You need production-ready cloud vector retrieval

Step 2: Add Dependencies

Example using OpenAI and PGVector:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
</dependency>

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <scope>runtime</scope>
</dependency>

Advisor Dependency for Spring AI RAG

Spring AI documentation says QuestionAnswerAdvisor and VectorStoreChatMemoryAdvisor require the spring-ai-advisors-vector-store dependency. ([docs.spring.io](https://docs.spring.io/spring-ai/reference/api/retrieval-augmented-generation.html))

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>

Step 3: Configure application.properties

spring.application.name=spring-ai-rag-demo

spring.ai.model.chat=openai
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o-mini

spring.ai.model.embedding=openai
spring.ai.openai.embedding.options.model=text-embedding-3-small

spring.datasource.url=jdbc:postgresql://localhost:5432/spring_ai
spring.datasource.username=postgres
spring.datasource.password=postgres

spring.ai.vectorstore.pgvector.initialize-schema=true
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=1536

The vector dimension must match the embedding model output dimension.


Step 4: Start PGVector with Docker

docker run --name pgvector-db \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=spring_ai \
  -p 5432:5432 \
  -d pgvector/pgvector:pg16

Enable the extension:

docker exec -it pgvector-db psql -U postgres -d spring_ai
CREATE EXTENSION IF NOT EXISTS vector;

Step 5: Create Document Ingestion Service

The ingestion service stores documents into the vector store.

package com.dhanish.rag.service;

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.Map;

@Service
public class DocumentIngestionService {

    private final VectorStore vectorStore;

    public DocumentIngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public void loadSampleDocuments() {

        Document doc1 = new Document(
                "Spring AI helps Java developers build AI applications using chat models, embeddings, vector stores, and RAG.",
                Map.of(
                        "source", "spring-ai-course",
                        "topic", "spring-ai"
                )
        );

        Document doc2 = new Document(
                "Retrieval-Augmented Generation retrieves relevant documents from a vector database and uses them as context for a chat model.",
                Map.of(
                        "source", "rag-guide",
                        "topic", "rag"
                )
        );

        Document doc3 = new Document(
                "PGVector is a PostgreSQL extension used to store and search vector embeddings for semantic search.",
                Map.of(
                        "source", "pgvector-guide",
                        "topic", "vector-database"
                )
        );

        vectorStore.add(List.of(doc1, doc2, doc3));
    }
}

Step 6: Create Search Service

package com.dhanish.rag.service;

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
public class SemanticSearchService {

    private final VectorStore vectorStore;

    public SemanticSearchService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public List<Document> search(String question) {
        return vectorStore.similaritySearch(question);
    }
}

Step 7: Manual RAG with ChatClient

This approach gives full control over the RAG prompt.

package com.dhanish.rag.service;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.stream.Collectors;

@Service
public class ManualRagService {

    private final VectorStore vectorStore;
    private final ChatClient chatClient;

    public ManualRagService(VectorStore vectorStore,
                            ChatClient.Builder builder) {
        this.vectorStore = vectorStore;
        this.chatClient = builder.build();
    }

    public String answer(String question) {

        List<Document> documents =
                vectorStore.similaritySearch(question);

        String context = documents.stream()
                .map(Document::getText)
                .collect(Collectors.joining("\n\n"));

        return chatClient.prompt()
                .system("""
                        You are a helpful AI assistant.

                        Rules:
                        1. Answer only using the provided context.
                        2. Do not guess.
                        3. If the answer is not in the context, say:
                           I do not have enough information.
                        4. Keep the answer clear and practical.
                        """)
                .user("""
                      Context:
                      %s

                      Question:
                      %s
                      """.formatted(context, question))
                .call()
                .content();
    }
}

Step 8: RAG with QuestionAnswerAdvisor

Spring AI also supports Advisor-based RAG. Advisors can enrich a ChatClient request with retrieved context automatically.

package com.dhanish.rag.service;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

@Service
public class AdvisorRagService {

    private final ChatClient chatClient;

    public AdvisorRagService(ChatClient.Builder builder,
                             VectorStore vectorStore) {

        this.chatClient = builder
                .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore))
                .build();
    }

    public String answer(String question) {

        return chatClient.prompt()
                .user(question)
                .call()
                .content();
    }
}

The Advisor approach is cleaner for standard RAG flows, while manual RAG is useful when you want complete control over prompt format and retrieval behavior.


Step 9: Create REST Controller

package com.dhanish.rag.controller;

import com.dhanish.rag.service.DocumentIngestionService;
import com.dhanish.rag.service.ManualRagService;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/rag")
public class RagController {

    private final DocumentIngestionService ingestionService;
    private final ManualRagService ragService;

    public RagController(DocumentIngestionService ingestionService,
                         ManualRagService ragService) {
        this.ingestionService = ingestionService;
        this.ragService = ragService;
    }

    @PostMapping("/load")
    public String load() {
        ingestionService.loadSampleDocuments();
        return "Documents loaded into vector store successfully.";
    }

    @GetMapping("/ask")
    public String ask(@RequestParam String question) {
        return ragService.answer(question);
    }
}

Step 10: Test RAG APIs

Load Documents

curl -X POST http://localhost:8080/api/rag/load

Ask Question

curl "http://localhost:8080/api/rag/ask?question=What is RAG used for?"

Expected RAG Request Flow

Client
  |
  v
/api/rag/ask
  |
  v
RagController
  |
  v
ManualRagService
  |
  v
VectorStore Similarity Search
  |
  v
Relevant Documents
  |
  v
ChatClient
  |
  v
Grounded Answer

Document Chunking Strategy

Real-world documents are usually large. Do not store an entire PDF or article as one vector.

Split documents into meaningful chunks:

  • By heading
  • By paragraph
  • By section
  • By page
  • By topic

Chunking Flow

Large Document
      |
      +-- Chunk 1: Introduction
      +-- Chunk 2: Architecture
      +-- Chunk 3: Code Example
      +-- Chunk 4: Best Practices
      +-- Chunk 5: Troubleshooting

Why Chunking Matters?

  • Improves retrieval accuracy
  • Reduces irrelevant context
  • Controls token usage
  • Improves answer grounding
  • Makes source tracking easier

Metadata Strategy

Always store metadata with chunks.

{
  "source": "spring-ai-rag-guide",
  "topic": "rag",
  "module": "spring-ai",
  "page": 3,
  "tenantId": "dhanish-empower"
}

Metadata helps:

  • Filtering
  • Source tracking
  • Citations
  • Debugging
  • Tenant isolation
  • Access control

RAG Prompt Template

You are a helpful assistant.

Use only the provided context.

Context:
{context}

Question:
{question}

Rules:
1. Do not guess.
2. If answer is missing, say you do not have enough information.
3. Keep the answer clear.
4. Mention source if available.

Good RAG Answer Behavior

Situation Correct Behavior
Context contains answer Answer using context
Context does not contain answer Say information is unavailable
Context is conflicting Mention uncertainty
User asks unrelated question Do not hallucinate

RAG for AI Agents

Agentic AI systems use RAG to retrieve knowledge before deciding actions.

User Goal
   |
   v
Agent Understands Intent
   |
   v
RAG Retrieves Relevant Knowledge
   |
   v
Agent Plans Action
   |
   v
Tool Execution
   |
   v
Final Response

RAG with Tool Calling

Some workflows need both RAG and tools.

Example

User:
Where is my order and what is the refund policy?

System flow:

  • Call Order API for live order status
  • Use RAG to retrieve refund policy
  • Generate combined answer

Hybrid RAG Architecture

User Question
      |
      +-- Tool Call for Live Data
      |
      +-- Vector Search for Policy Data
      |
      v
Combined Context
      |
      v
Chat Model
      |
      v
Final Answer

Security in RAG

RAG systems can expose sensitive documents if access control is missing.

Always apply:

  • User authentication
  • Authorization checks
  • Tenant filters
  • Metadata filters
  • Safe logging
  • Document-level access control

Safe RAG Retrieval Flow

User Request
      |
      v
Authenticate User
      |
      v
Check Permissions
      |
      v
Apply Tenant Filter
      |
      v
Vector Search
      |
      v
Allowed Documents Only

Multi-Tenant RAG Example

In a SaaS application:

Tenant A user → search only Tenant A documents
Tenant B user → search only Tenant B documents

Without tenant isolation, one user may receive another customer’s data.


Evaluating RAG Quality

RAG quality depends on retrieval quality and answer quality.

Measure:

  • Did retrieval find the right documents?
  • Did the answer use retrieved context?
  • Did the answer avoid guessing?
  • Was the answer clear?
  • Was the source correct?

RAG Evaluation Dataset Example

Question Expected Source Expected Behavior
What is PGVector? pgvector-guide Explain PGVector
What is refund timeline? refund-policy Answer from policy
What is CEO salary? None Say not enough information

Monitoring RAG in Production

Track:

  • Vector search latency
  • Embedding generation time
  • Empty retrieval count
  • Average similarity score
  • Top-K relevance
  • RAG fallback rate
  • User feedback score
  • Hallucination reports
  • Token usage
  • Cost per answer

Production RAG Monitoring Flow

RAG Request
   |
   +-- Retrieval Metrics
   +-- Prompt Metrics
   +-- LLM Metrics
   +-- Answer Quality Feedback
   |
   v
Observability Dashboard

Common RAG Mistakes

1. Poor Chunking

Large or random chunks reduce retrieval quality.

2. No Metadata

Difficult to filter, cite, and debug.

3. No Access Control

Sensitive data may leak.

4. Asking Model to Guess

RAG should instruct the model to avoid unsupported answers.

5. Too Many Retrieved Chunks

This increases token cost and may confuse the model.

6. Not Updating Embeddings

Changed documents require updated embeddings.


Best Practices for RAG

  • Use meaningful chunks
  • Add metadata to every document
  • Use consistent embedding models
  • Use access control before retrieval
  • Keep top-k reasonable
  • Use clear RAG prompts
  • Monitor retrieval quality
  • Evaluate answers with real user questions
  • Track source documents
  • Re-index changed documents
  • Use fallback responses when context is missing

Production RAG Architecture

Frontend
   |
   v
Spring Boot AI API
   |
   +-- Authentication
   +-- Document Retrieval
   +-- VectorStore
   +-- Prompt Builder
   +-- ChatClient
   +-- Response Validator
   +-- Monitoring
   |
   v
Grounded Answer

Common Errors and Fixes

1. Empty Answers

Possible causes:

  • No documents loaded
  • Poor similarity match
  • Wrong vector store configuration
  • Embedding model mismatch

2. Wrong Answers

Possible causes:

  • Irrelevant chunks retrieved
  • Weak prompt instructions
  • Too much unrelated context
  • Documents are outdated

3. Dimension Mismatch

Vector store dimension must match embedding model dimension.


4. Slow RAG Response

Possible fixes:

  • Optimize vector index
  • Reduce top-k
  • Use caching
  • Use faster model
  • Shorten context

5. Hallucination Despite RAG

Fix:

  • Strengthen prompt rules
  • Use better retrieval
  • Add answer validation
  • Reject unsupported claims

Interview Questions

Q1: What is RAG?

RAG stands for Retrieval-Augmented Generation. It retrieves relevant external knowledge and gives it to a language model to generate grounded answers.

Q2: Why is RAG needed?

RAG helps AI systems answer using updated, private, and domain-specific data instead of relying only on model training knowledge.

Q3: What are the main components of RAG?

Documents, chunking, embeddings, vector store, retriever, prompt builder, and chat model.

Q4: What is the role of VectorStore in RAG?

VectorStore stores embeddings and retrieves semantically similar documents for the user question.

Q5: What is QuestionAnswerAdvisor?

QuestionAnswerAdvisor is a Spring AI Advisor that supports common RAG flows by retrieving relevant context from a vector store and adding it to the chat request.


Advanced Interview Questions

Q1: Manual RAG vs Advisor-based RAG?

Manual RAG gives full control over retrieval and prompt construction, while Advisor-based RAG provides cleaner integration for common use cases.

Q2: How do you reduce hallucination in RAG?

Use high-quality retrieval, strong prompts, response validation, source tracking, and instructions to avoid guessing.

Q3: Why is metadata important in RAG?

Metadata supports filtering, tenant isolation, source citation, debugging, and access control.

Q4: What happens if the embedding model changes?

Existing vectors may need to be regenerated because dimensions or semantic vector space may change.

Q5: How do you secure multi-tenant RAG?

Authenticate users, enforce authorization, apply tenant metadata filters, and retrieve only allowed documents.


Recommended Learning Path


Summary

Retrieval-Augmented Generation is a powerful architecture for building reliable AI applications that answer using your own knowledge base. Instead of depending only on the model’s internal knowledge, RAG retrieves relevant documents and uses them as context for the final answer.

In Spring AI, RAG can be implemented using embeddings, VectorStore, ChatClient, and Advisor APIs such as QuestionAnswerAdvisor.

For production systems such as learning platforms, banking assistants, e-commerce support bots, SaaS knowledge bases, and enterprise AI agents, RAG improves factual accuracy, context awareness, user trust, and answer quality.

A strong RAG system depends on good chunking, high-quality embeddings, secure retrieval, metadata filtering, clear prompts, monitoring, and regular evaluation.

About the Author

Naresh Kumar

Naresh Kumar

Senior Java Backend Engineer experienced in Banking, Payments, ISO 20022, Spring Boot, Microservices, Kafka, Docker, Kubernetes, AWS and Cloud Native Systems.

Built enterprise payment solutions, transaction processing systems, API platforms and scalable microservices used in production.

LinkedIn Profile