Implementing Retrieval-Augmented Generation (RAG) with Spring AI

Retrieval-Augmented Generation, commonly called RAG, is one of the most important architectures for building reliable AI applications. A normal chat model answers using the knowledge it learned during training. But enterprise applications usually need answers from private, updated, domain-specific data such as documents, FAQs, policies, product catalogs, interview questions, course content, tickets, or database records.

RAG solves this problem by retrieving relevant information from your own knowledge base and giving that context to the AI model before generating the final answer.

Spring AI provides support for RAG flows using VectorStore, embeddings, ChatClient, and Advisor APIs. The Spring AI documentation explains that RAG helps overcome LLM limitations around long-form content, factual accuracy, and context awareness, and it provides Advisor-based support such as QuestionAnswerAdvisor for common RAG workflows. ([docs.spring.io](https://docs.spring.io/spring-ai/reference/api/retrieval-augmented-generation.html))

What is RAG?

RAG stands for Retrieval-Augmented Generation.

It combines two steps:

Retrieval: Find relevant information from your own documents or database
Generation: Send the retrieved context to an AI model and generate an answer

Simple RAG Flow

User Question
      |
      v
Search Knowledge Base
      |
      v
Retrieve Relevant Context
      |
      v
Send Context + Question to LLM
      |
      v
Generate Grounded Answer

Why RAG is Needed?

Without RAG, an AI model may:

Give outdated answers
Guess information
Hallucinate facts
Fail to answer company-specific questions
Miss private business knowledge
Provide generic responses

With RAG, the model receives real context from your documents before answering.

Normal LLM vs RAG-Based LLM

Normal LLM	RAG-Based LLM
Uses model training knowledge	Uses retrieved enterprise knowledge
May hallucinate	More grounded in provided context
May be outdated	Can use updated documents
Generic answers	Domain-specific answers
No source tracking	Can track source documents

Real-Time Learning Platform Example

Suppose a learning platform has course content about:

Java
Spring Boot
Microservices
Docker
Kubernetes
Spring AI
Agentic AI

User asks:

Which course should I learn to build scalable backend systems?

A RAG system can search existing course content, retrieve relevant lessons, and generate a personalized answer using platform-specific content.

Real-Time Banking Example

A banking AI assistant may answer questions using verified banking documents.

User:
Amount was debited but UPI transaction failed. When will it be reversed?

RAG flow:

Search failed UPI transaction policy
Retrieve reversal timeline document
Send policy context to chat model
Generate grounded response

The model should not guess. It should answer only from retrieved banking policy.

Real-Time E-Commerce Example

An e-commerce AI assistant may answer refund and delivery questions.

User:
Can I return a damaged mobile phone after delivery?

RAG retrieves:

Return policy
Damaged product policy
Refund processing timeline
Replacement rules

The AI then generates a clear customer-friendly answer.

Core Components of RAG

Component	Purpose
Documents	Knowledge source
Chunking	Splits large documents into smaller parts
Embedding Model	Converts text into vectors
Vector Store	Stores and searches embeddings
Retriever	Finds relevant chunks
Chat Model	Generates answer using retrieved context

Spring AI RAG Architecture

Documents
   |
   v
Text Extraction
   |
   v
Chunking
   |
   v
EmbeddingModel
   |
   v
VectorStore
   |
   v
Similarity Search
   |
   v
ChatClient
   |
   v
Grounded Answer

Spring AI Building Blocks for RAG

Spring AI provides important abstractions for RAG:

Document
EmbeddingModel
VectorStore
ChatClient
QuestionAnswerAdvisor
VectorStoreRetriever

Spring AI also provides VectorStoreRetriever, a read-only view of a vector store that exposes similarity search functionality. This is useful in RAG applications where the application only needs retrieval access and should not modify vector data. ([docs.spring.io](https://docs.spring.io/spring-ai/reference/api/vectordbs.html))

Step 1: Choose a Vector Database

RAG needs a vector store to save and search embeddings.

Common options:

PGVector
Pinecone
MongoDB Atlas Vector Search
Redis Vector Search
Qdrant
Milvus
Weaviate
Elasticsearch
OpenSearch

When to Use PGVector?

You already use PostgreSQL
Your dataset is small to medium
You want simple local development
You prefer SQL-based infrastructure

When to Use Pinecone or Cloud Vector DB?

You need managed vector infrastructure
You expect large-scale vector search
You want easier scaling
You need production-ready cloud vector retrieval

Step 2: Add Dependencies

Example using OpenAI and PGVector:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
</dependency>

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <scope>runtime</scope>
</dependency>

Advisor Dependency for Spring AI RAG

Spring AI documentation says QuestionAnswerAdvisor and VectorStoreChatMemoryAdvisor require the spring-ai-advisors-vector-store dependency. ([docs.spring.io](https://docs.spring.io/spring-ai/reference/api/retrieval-augmented-generation.html))

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>

Step 3: Configure application.properties

spring.application.name=spring-ai-rag-demo

spring.ai.model.chat=openai
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o-mini

spring.ai.model.embedding=openai
spring.ai.openai.embedding.options.model=text-embedding-3-small

spring.datasource.url=jdbc:postgresql://localhost:5432/spring_ai
spring.datasource.username=postgres
spring.datasource.password=postgres

spring.ai.vectorstore.pgvector.initialize-schema=true
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=1536

The vector dimension must match the embedding model output dimension.

Step 4: Start PGVector with Docker

docker run --name pgvector-db \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=spring_ai \
  -p 5432:5432 \
  -d pgvector/pgvector:pg16

Enable the extension:

docker exec -it pgvector-db psql -U postgres -d spring_ai

CREATE EXTENSION IF NOT EXISTS vector;

Step 5: Create Document Ingestion Service

The ingestion service stores documents into the vector store.

package com.dhanish.rag.service;

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.Map;

@Service
public class DocumentIngestionService {

    private final VectorStore vectorStore;

    public DocumentIngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public void loadSampleDocuments() {

        Document doc1 = new Document(
                "Spring AI helps Java developers build AI applications using chat models, embeddings, vector stores, and RAG.",
                Map.of(
                        "source", "spring-ai-course",
                        "topic", "spring-ai"
                )
        );

        Document doc2 = new Document(
                "Retrieval-Augmented Generation retrieves relevant documents from a vector database and uses them as context for a chat model.",
                Map.of(
                        "source", "rag-guide",
                        "topic", "rag"
                )
        );

        Document doc3 = new Document(
                "PGVector is a PostgreSQL extension used to store and search vector embeddings for semantic search.",
                Map.of(
                        "source", "pgvector-guide",
                        "topic", "vector-database"
                )
        );

        vectorStore.add(List.of(doc1, doc2, doc3));
    }
}

Step 6: Create Search Service

package com.dhanish.rag.service;

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
public class SemanticSearchService {

    private final VectorStore vectorStore;

    public SemanticSearchService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public List<Document> search(String question) {
        return vectorStore.similaritySearch(question);
    }
}

Step 7: Manual RAG with ChatClient

This approach gives full control over the RAG prompt.

package com.dhanish.rag.service;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.stream.Collectors;

@Service
public class ManualRagService {

    private final VectorStore vectorStore;
    private final ChatClient chatClient;

    public ManualRagService(VectorStore vectorStore,
                            ChatClient.Builder builder) {
        this.vectorStore = vectorStore;
        this.chatClient = builder.build();
    }

    public String answer(String question) {

        List<Document> documents =
                vectorStore.similaritySearch(question);

        String context = documents.stream()
                .map(Document::getText)
                .collect(Collectors.joining("\n\n"));

        return chatClient.prompt()
                .system("""
                        You are a helpful AI assistant.

                        Rules:
                        1. Answer only using the provided context.
                        2. Do not guess.
                        3. If the answer is not in the context, say:
                           I do not have enough information.
                        4. Keep the answer clear and practical.
                        """)
                .user("""
                      Context:
                      %s

                      Question:
                      %s
                      """.formatted(context, question))
                .call()
                .content();
    }
}

Step 8: RAG with QuestionAnswerAdvisor

Spring AI also supports Advisor-based RAG. Advisors can enrich a ChatClient request with retrieved context automatically.

package com.dhanish.rag.service;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

@Service
public class AdvisorRagService {

    private final ChatClient chatClient;

    public AdvisorRagService(ChatClient.Builder builder,
                             VectorStore vectorStore) {

        this.chatClient = builder
                .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore))
                .build();
    }

    public String answer(String question) {

        return chatClient.prompt()
                .user(question)
                .call()
                .content();
    }
}

The Advisor approach is cleaner for standard RAG flows, while manual RAG is useful when you want complete control over prompt format and retrieval behavior.

Step 9: Create REST Controller

package com.dhanish.rag.controller;

import com.dhanish.rag.service.DocumentIngestionService;
import com.dhanish.rag.service.ManualRagService;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/rag")
public class RagController {

    private final DocumentIngestionService ingestionService;
    private final ManualRagService ragService;

    public RagController(DocumentIngestionService ingestionService,
                         ManualRagService ragService) {
        this.ingestionService = ingestionService;
        this.ragService = ragService;
    }

    @PostMapping("/load")
    public String load() {
        ingestionService.loadSampleDocuments();
        return "Documents loaded into vector store successfully.";
    }

    @GetMapping("/ask")
    public String ask(@RequestParam String question) {
        return ragService.answer(question);
    }
}

Step 10: Test RAG APIs

Load Documents

curl -X POST http://localhost:8080/api/rag/load

Ask Question

curl "http://localhost:8080/api/rag/ask?question=What is RAG used for?"

Expected RAG Request Flow

Client
  |
  v
/api/rag/ask
  |
  v
RagController
  |
  v
ManualRagService
  |
  v
VectorStore Similarity Search
  |
  v
Relevant Documents
  |
  v
ChatClient
  |
  v
Grounded Answer

Document Chunking Strategy

Real-world documents are usually large. Do not store an entire PDF or article as one vector.

Split documents into meaningful chunks:

By heading
By paragraph
By section
By page
By topic

Chunking Flow

Large Document
      |
      +-- Chunk 1: Introduction
      +-- Chunk 2: Architecture
      +-- Chunk 3: Code Example
      +-- Chunk 4: Best Practices
      +-- Chunk 5: Troubleshooting

Why Chunking Matters?

Improves retrieval accuracy
Reduces irrelevant context
Controls token usage
Improves answer grounding
Makes source tracking easier

Metadata Strategy

Always store metadata with chunks.

{
  "source": "spring-ai-rag-guide",
  "topic": "rag",
  "module": "spring-ai",
  "page": 3,
  "tenantId": "dhanish-empower"
}

Metadata helps:

Filtering
Source tracking
Citations
Debugging
Tenant isolation
Access control

RAG Prompt Template

You are a helpful assistant.

Use only the provided context.

Context:
{context}

Question:
{question}

Rules:
1. Do not guess.
2. If answer is missing, say you do not have enough information.
3. Keep the answer clear.
4. Mention source if available.

Good RAG Answer Behavior

Situation	Correct Behavior
Context contains answer	Answer using context
Context does not contain answer	Say information is unavailable
Context is conflicting	Mention uncertainty
User asks unrelated question	Do not hallucinate

RAG for AI Agents

Agentic AI systems use RAG to retrieve knowledge before deciding actions.

User Goal
   |
   v
Agent Understands Intent
   |
   v
RAG Retrieves Relevant Knowledge
   |
   v
Agent Plans Action
   |
   v
Tool Execution
   |
   v
Final Response

RAG with Tool Calling

Some workflows need both RAG and tools.

Example

User:
Where is my order and what is the refund policy?

System flow:

Call Order API for live order status
Use RAG to retrieve refund policy
Generate combined answer

Hybrid RAG Architecture

User Question
      |
      +-- Tool Call for Live Data
      |
      +-- Vector Search for Policy Data
      |
      v
Combined Context
      |
      v
Chat Model
      |
      v
Final Answer

Security in RAG

RAG systems can expose sensitive documents if access control is missing.

Always apply:

User authentication
Authorization checks
Tenant filters
Metadata filters
Safe logging
Document-level access control

Safe RAG Retrieval Flow

User Request
      |
      v
Authenticate User
      |
      v
Check Permissions
      |
      v
Apply Tenant Filter
      |
      v
Vector Search
      |
      v
Allowed Documents Only

Multi-Tenant RAG Example

In a SaaS application:

Tenant A user â†’ search only Tenant A documents
Tenant B user â†’ search only Tenant B documents

Without tenant isolation, one user may receive another customerâ€™s data.

Evaluating RAG Quality

RAG quality depends on retrieval quality and answer quality.

Measure:

Did retrieval find the right documents?
Did the answer use retrieved context?
Did the answer avoid guessing?
Was the answer clear?
Was the source correct?

RAG Evaluation Dataset Example

Question	Expected Source	Expected Behavior
What is PGVector?	pgvector-guide	Explain PGVector
What is refund timeline?	refund-policy	Answer from policy
What is CEO salary?	None	Say not enough information

Monitoring RAG in Production

Track:

Vector search latency
Embedding generation time
Empty retrieval count
Average similarity score
Top-K relevance
RAG fallback rate
User feedback score
Hallucination reports
Token usage
Cost per answer

Production RAG Monitoring Flow

RAG Request
   |
   +-- Retrieval Metrics
   +-- Prompt Metrics
   +-- LLM Metrics
   +-- Answer Quality Feedback
   |
   v
Observability Dashboard

Common RAG Mistakes

1. Poor Chunking

Large or random chunks reduce retrieval quality.

2. No Metadata

Difficult to filter, cite, and debug.

3. No Access Control

Sensitive data may leak.

4. Asking Model to Guess

RAG should instruct the model to avoid unsupported answers.

5. Too Many Retrieved Chunks

This increases token cost and may confuse the model.

6. Not Updating Embeddings

Changed documents require updated embeddings.

Best Practices for RAG

Use meaningful chunks
Add metadata to every document
Use consistent embedding models
Use access control before retrieval
Keep top-k reasonable
Use clear RAG prompts
Monitor retrieval quality
Evaluate answers with real user questions
Track source documents
Re-index changed documents
Use fallback responses when context is missing

Production RAG Architecture

Frontend
   |
   v
Spring Boot AI API
   |
   +-- Authentication
   +-- Document Retrieval
   +-- VectorStore
   +-- Prompt Builder
   +-- ChatClient
   +-- Response Validator
   +-- Monitoring
   |
   v
Grounded Answer

Common Errors and Fixes

1. Empty Answers

Possible causes:

No documents loaded
Poor similarity match
Wrong vector store configuration
Embedding model mismatch

2. Wrong Answers

Possible causes:

Irrelevant chunks retrieved
Weak prompt instructions
Too much unrelated context
Documents are outdated

3. Dimension Mismatch

Vector store dimension must match embedding model dimension.

4. Slow RAG Response

Possible fixes:

Optimize vector index
Reduce top-k
Use caching
Use faster model
Shorten context

5. Hallucination Despite RAG

Fix:

Strengthen prompt rules
Use better retrieval
Add answer validation
Reject unsupported claims

Interview Questions

Q1: What is RAG?

RAG stands for Retrieval-Augmented Generation. It retrieves relevant external knowledge and gives it to a language model to generate grounded answers.

Q2: Why is RAG needed?

RAG helps AI systems answer using updated, private, and domain-specific data instead of relying only on model training knowledge.

Q3: What are the main components of RAG?

Documents, chunking, embeddings, vector store, retriever, prompt builder, and chat model.

Q4: What is the role of VectorStore in RAG?

VectorStore stores embeddings and retrieves semantically similar documents for the user question.

Q5: What is QuestionAnswerAdvisor?

QuestionAnswerAdvisor is a Spring AI Advisor that supports common RAG flows by retrieving relevant context from a vector store and adding it to the chat request.

Advanced Interview Questions

Q1: Manual RAG vs Advisor-based RAG?

Manual RAG gives full control over retrieval and prompt construction, while Advisor-based RAG provides cleaner integration for common use cases.

Q2: How do you reduce hallucination in RAG?

Use high-quality retrieval, strong prompts, response validation, source tracking, and instructions to avoid guessing.

Q3: Why is metadata important in RAG?

Metadata supports filtering, tenant isolation, source citation, debugging, and access control.

Q4: What happens if the embedding model changes?

Existing vectors may need to be regenerated because dimensions or semantic vector space may change.

Q5: How do you secure multi-tenant RAG?

Authenticate users, enforce authorization, apply tenant metadata filters, and retrieve only allowed documents.

Recommended Learning Path

Summary

Retrieval-Augmented Generation is a powerful architecture for building reliable AI applications that answer using your own knowledge base. Instead of depending only on the modelâ€™s internal knowledge, RAG retrieves relevant documents and uses them as context for the final answer.

In Spring AI, RAG can be implemented using embeddings, VectorStore, ChatClient, and Advisor APIs such as QuestionAnswerAdvisor.

For production systems such as learning platforms, banking assistants, e-commerce support bots, SaaS knowledge bases, and enterprise AI agents, RAG improves factual accuracy, context awareness, user trust, and answer quality.

A strong RAG system depends on good chunking, high-quality embeddings, secure retrieval, metadata filtering, clear prompts, monitoring, and regular evaluation.