Published: 2026-06-01 • Updated: 2026-06-20

Integrating Pinecone and Cloud Vector Databases with Spring AI

Cloud vector databases are widely used in modern AI applications because they help store embeddings, perform semantic similarity search, and power Retrieval-Augmented Generation systems at scale. When a Spring AI application needs fast semantic search across thousands, millions, or even billions of vectors, a managed vector database such as Pinecone can reduce infrastructure complexity.

Spring AI provides a VectorStore abstraction for working with vector databases, and it includes Pinecone integration for storing document embeddings and performing similarity searches. Pinecone is a cloud-based vector database designed for efficient vector storage and search. ([docs.spring.io](https://docs.spring.io/spring-ai/reference/api/vectordbs/pinecone.html))


What is Pinecone?

Pinecone is a managed cloud vector database used to store and search embeddings. It is commonly used for semantic search, RAG, recommendation systems, AI agents, document search, and enterprise knowledge assistants.

Instead of managing PostgreSQL extensions, indexes, scaling, and vector infrastructure manually, Pinecone provides a managed platform where developers create indexes, insert vectors, attach metadata, and run similarity searches.


Why Use Pinecone with Spring AI?

  • Managed vector database service
  • No need to manage database servers manually
  • Good for scalable RAG systems
  • Supports metadata with vector records
  • Supports semantic similarity search
  • Useful for multi-tenant AI platforms
  • Works well with Spring AI VectorStore abstraction

Spring AI + Pinecone Architecture

User Question
      |
      v
Spring Boot API
      |
      v
Embedding Model
      |
      v
Pinecone Vector Store
      |
      v
Relevant Documents Retrieved
      |
      v
ChatClient
      |
      v
Grounded AI Answer

Pinecone vs Self-Managed Vector Databases

Pinecone Self-Managed Vector Database
Managed cloud service You manage infrastructure
Less operational overhead More control over deployment
Good for scale and production RAG Good for private/on-prem requirements
Requires API key and cloud access Can run inside private network
Provider-managed scaling You handle scaling and tuning

Real-Time Learning Platform Example

Suppose your learning platform has thousands of course lessons, interview questions, projects, and tutorials.

User searches:

I want to learn how to deploy Java microservices in cloud.

Pinecone can retrieve semantically related content such as:

  • Spring Boot Microservices
  • Docker Deployment
  • Kubernetes Autoscaling
  • CI/CD Pipelines
  • AWS Deployment

This improves search quality beyond exact keyword matching.


Real-Time Banking Example

A banking AI assistant may store policy documents, FAQ content, transaction issue guides, credit card rules, and loan documents in Pinecone.

User asks:

Amount deducted but UPI transaction failed. When will I get it back?

Pinecone can retrieve the most relevant failed payment reversal policy and help the AI produce a grounded answer.


Real-Time E-Commerce Example

An e-commerce platform can use Pinecone for:

  • Product recommendation
  • Refund policy search
  • Delivery support
  • Warranty question answering
  • Customer support automation

User asks:

Can I return a broken product after delivery?

The vector search can retrieve return policy and damaged item policy documents even when the user does not use exact keywords.


Step 1: Create a Pinecone Account and API Key

Create a Pinecone account, create an API key, and keep the key secure. Do not hardcode it in Java code, Git repositories, or frontend applications.

Use environment variables or secret managers for production deployments.

export PINECONE_API_KEY=your_pinecone_api_key_here

Step 2: Create a Pinecone Index

A Pinecone index stores vectors. Each record in a Pinecone index contains an ID and a vector, and can also include metadata for additional context. Pinecone metadata can be used later as a filter during search. ([docs.pinecone.io](https://docs.pinecone.io/guides/index-data/indexing-overview))

When creating an index, choose:

  • Index name
  • Vector dimension
  • Similarity metric
  • Cloud provider
  • Region

Important: Dimension Must Match Embedding Model

The vector dimension in Pinecone must match the embedding model output dimension.

Embedding Model Example Dimension
OpenAI text-embedding-3-small 1536
Some local embedding models 768
Other providers Depends on model

If the dimensions do not match, vector insert or search operations will fail.


Step 3: Create Spring Boot Project

Create a Spring Boot project with:

  • Java 17 or later
  • Spring Web
  • Spring Boot Actuator
  • Spring AI Pinecone Vector Store starter
  • Spring AI embedding model starter
  • Spring AI chat model starter

Project Structure

spring-ai-pinecone-demo/
|
|-- src/main/java/com/dhanish/pinecone/
|   |
|   |-- SpringAiPineconeApplication.java
|   |-- controller/
|   |   |-- KnowledgeController.java
|   |   |-- RagController.java
|   |
|   |-- service/
|       |-- KnowledgeIngestionService.java
|       |-- SemanticSearchService.java
|       |-- RagAnswerService.java
|
|-- src/main/resources/
|   |-- application.properties
|
|-- pom.xml

Step 4: Add Spring AI BOM

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>1.0.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Step 5: Add Pinecone Vector Store Dependency

Spring AI provides a Pinecone vector store starter artifact named spring-ai-starter-vector-store-pinecone. ([central.sonatype.com](https://central.sonatype.com/artifact/org.springframework.ai/spring-ai-starter-vector-store-pinecone))

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-pinecone</artifactId>
</dependency>

Step 6: Add Embedding and Chat Model Dependencies

Example using OpenAI:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

You can also use other embedding providers depending on your architecture.


Step 7: Configure application.properties

spring.application.name=spring-ai-pinecone-demo

spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.model.embedding=openai
spring.ai.openai.embedding.options.model=text-embedding-3-small

spring.ai.model.chat=openai
spring.ai.openai.chat.options.model=gpt-4o-mini

spring.ai.vectorstore.pinecone.api-key=${PINECONE_API_KEY}
spring.ai.vectorstore.pinecone.index-name=dhanish-knowledge-index
spring.ai.vectorstore.pinecone.environment=us-east-1
spring.ai.vectorstore.pinecone.project-id=your-project-id
spring.ai.vectorstore.pinecone.namespace=default

Property names may vary slightly by Spring AI version. Always verify with the Spring AI Pinecone reference for the version used in your project. ([docs.spring.io](https://docs.spring.io/spring-ai/reference/api/vectordbs/pinecone.html))


Step 8: Create Main Application Class

package com.dhanish.pinecone;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class SpringAiPineconeApplication {

    public static void main(String[] args) {
        SpringApplication.run(SpringAiPineconeApplication.class, args);
    }
}

Step 9: Create Knowledge Ingestion Service

package com.dhanish.pinecone.service;

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.Map;

@Service
public class KnowledgeIngestionService {

    private final VectorStore vectorStore;

    public KnowledgeIngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public void addSampleDocuments() {

        Document springAiDoc = new Document(
                "Spring AI helps Java developers build AI applications using chat models, embeddings, vector stores, RAG, and AI agents.",
                Map.of(
                        "topic", "spring-ai",
                        "category", "ai",
                        "source", "internal-course"
                )
        );

        Document pineconeDoc = new Document(
                "Pinecone is a managed cloud vector database used for semantic search, RAG, recommendations, and AI agent memory.",
                Map.of(
                        "topic", "pinecone",
                        "category", "vector-database",
                        "source", "internal-course"
                )
        );

        Document ragDoc = new Document(
                "Retrieval-Augmented Generation retrieves relevant documents from a vector store and sends them to a chat model for grounded answers.",
                Map.of(
                        "topic", "rag",
                        "category", "architecture",
                        "source", "internal-course"
                )
        );

        vectorStore.add(List.of(springAiDoc, pineconeDoc, ragDoc));
    }
}

Step 10: Create Semantic Search Service

package com.dhanish.pinecone.service;

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
public class SemanticSearchService {

    private final VectorStore vectorStore;

    public SemanticSearchService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public List<Document> search(String query) {
        return vectorStore.similaritySearch(query);
    }
}

Step 11: Create Knowledge Controller

package com.dhanish.pinecone.controller;

import com.dhanish.pinecone.service.KnowledgeIngestionService;
import com.dhanish.pinecone.service.SemanticSearchService;
import org.springframework.ai.document.Document;
import org.springframework.web.bind.annotation.*;

import java.util.List;

@RestController
@RequestMapping("/api/pinecone")
public class KnowledgeController {

    private final KnowledgeIngestionService ingestionService;
    private final SemanticSearchService searchService;

    public KnowledgeController(KnowledgeIngestionService ingestionService,
                               SemanticSearchService searchService) {
        this.ingestionService = ingestionService;
        this.searchService = searchService;
    }

    @PostMapping("/load")
    public String load() {
        ingestionService.addSampleDocuments();
        return "Documents added to Pinecone successfully.";
    }

    @GetMapping("/search")
    public List<Document> search(@RequestParam String query) {
        return searchService.search(query);
    }
}

Step 12: Test Document Ingestion

curl -X POST http://localhost:8080/api/pinecone/load

Expected Response

Documents added to Pinecone successfully.

Step 13: Test Semantic Search

curl "http://localhost:8080/api/pinecone/search?query=How can I build AI search in Java?"

Expected result should include documents related to Spring AI, vector databases, or RAG even if the exact query words do not exist in the stored document.


Semantic Search Flow

User Query
     |
     v
Embedding Generated
     |
     v
Pinecone Similarity Search
     |
     v
Relevant Documents Returned

Step 14: Build RAG with Pinecone and ChatClient

Pinecone is commonly used as the retrieval layer for RAG systems.

User Question
      |
      v
Search Pinecone
      |
      v
Retrieve Related Documents
      |
      v
Send Context to Chat Model
      |
      v
Generate Grounded Answer

RAG Answer Service Example

package com.dhanish.pinecone.service;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.stream.Collectors;

@Service
public class RagAnswerService {

    private final VectorStore vectorStore;
    private final ChatClient chatClient;

    public RagAnswerService(VectorStore vectorStore,
                            ChatClient.Builder builder) {
        this.vectorStore = vectorStore;
        this.chatClient = builder.build();
    }

    public String answer(String question) {

        List<Document> documents =
                vectorStore.similaritySearch(question);

        String context = documents.stream()
                .map(Document::getText)
                .collect(Collectors.joining("\n\n"));

        return chatClient.prompt()
                .system("""
                        You are a helpful AI assistant.

                        Use only the provided context.
                        If the answer is not available in the context,
                        say: I do not have enough information.
                        """)
                .user("""
                      Context:
                      %s

                      Question:
                      %s
                      """.formatted(context, question))
                .call()
                .content();
    }
}

RAG Controller Example

package com.dhanish.pinecone.controller;

import com.dhanish.pinecone.service.RagAnswerService;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/rag")
public class RagController {

    private final RagAnswerService ragAnswerService;

    public RagController(RagAnswerService ragAnswerService) {
        this.ragAnswerService = ragAnswerService;
    }

    @GetMapping("/ask")
    public String ask(@RequestParam String question) {
        return ragAnswerService.answer(question);
    }
}

Test RAG API

curl "http://localhost:8080/api/rag/ask?question=What is Pinecone used for?"

Metadata in Pinecone

Metadata is very important for production vector search. Pinecone records can include metadata key-value pairs, and search queries can use metadata filters to limit results. ([docs.pinecone.io](https://docs.pinecone.io/guides/search/filter-by-metadata))

Example Metadata

{
  "topic": "spring-ai",
  "category": "ai",
  "tenantId": "tenant-a",
  "source": "course-content",
  "language": "english"
}

Why Metadata Matters?

  • Filter by category
  • Filter by tenant
  • Filter by language
  • Track source document
  • Support citations
  • Improve debugging
  • Improve security

Namespace Strategy in Pinecone

Pinecone supports namespaces, and its documentation describes a common multi-tenancy approach where a serverless index uses one namespace per tenant to isolate tenant data. ([docs.pinecone.io](https://docs.pinecone.io/guides/index-data/implement-multitenancy))

pinecone-index
   |
   +-- namespace: tenant-a
   +-- namespace: tenant-b
   +-- namespace: tenant-c

Namespace vs Metadata Filtering

Approach Best Use Case
Namespace Tenant or customer-level isolation
Metadata Filter Category, language, document type, source filtering
Both Together Strong multi-tenant enterprise search

Multi-Tenant RAG Example

Tenant A User
     |
     v
Search only Tenant A namespace

Tenant B User
     |
     v
Search only Tenant B namespace

This prevents one customer from retrieving another customer’s documents.


Cloud Vector Database Options

Vector Database Best Fit
Pinecone Managed vector search and scalable RAG
MongoDB Atlas Vector Search Teams already using MongoDB
Weaviate Cloud Semantic search and hybrid search
Qdrant Cloud Vector search with filtering
Milvus / Zilliz Cloud Large-scale vector workloads
Redis Cloud Low-latency vector search
Elasticsearch Cloud Hybrid keyword + vector search

When to Use Pinecone

  • You want managed vector infrastructure
  • You expect search traffic to grow
  • You do not want to tune vector indexes manually
  • You need metadata filtering
  • You are building production RAG
  • You need multi-tenant vector search
  • You want cloud-native semantic search

When to Use PGVector Instead

  • You already use PostgreSQL heavily
  • Your dataset is small to medium
  • You want simpler infrastructure
  • You want local development without external vector DB
  • You prefer SQL-based data management

When to Use MongoDB Atlas Vector Search

  • You already store application data in MongoDB
  • You want document database + vector search together
  • You need JSON document flexibility
  • You want managed cloud vector search inside MongoDB ecosystem

Production Document Ingestion Pipeline

Document Uploaded
      |
      v
File Validation
      |
      v
Text Extraction
      |
      v
Chunking
      |
      v
Embedding Generation
      |
      v
Metadata Added
      |
      v
Stored in Pinecone
      |
      v
Ready for RAG

Chunking Strategy

Good chunking improves vector search quality.

  • Keep chunks meaningful
  • Avoid random splitting
  • Use headings when possible
  • Add small overlap for long documents
  • Store source metadata
  • Test retrieval quality with real queries

Embedding Model Strategy

Use one embedding model consistently per index. If you change the embedding model, you may need to recreate embeddings and re-index documents.

Old embedding model → old vector space
New embedding model → new vector space

Do not mix blindly.

Security Best Practices

  • Never expose Pinecone API key in frontend code
  • Store API key in environment variables or secret managers
  • Use namespaces for tenant isolation
  • Use metadata filters for permissions
  • Do not store unnecessary sensitive data
  • Sanitize logs
  • Apply backend authorization before search
  • Monitor unusual query patterns

Safe Retrieval Flow

User Request
      |
      v
Authentication
      |
      v
Authorization Check
      |
      v
Select Namespace
      |
      v
Apply Metadata Filter
      |
      v
Vector Search
      |
      v
Allowed Documents Only

Monitoring Pinecone-Based RAG

Track:

  • Vector search latency
  • Embedding generation latency
  • Top-K result quality
  • Empty search results
  • Average similarity score
  • RAG fallback rate
  • Namespace usage
  • API errors
  • Cost and usage trends

Pinecone documentation includes guidance areas for increasing relevance, throughput, decreasing latency, and monitoring usage and costs. ([docs.pinecone.io](https://docs.pinecone.io/guides/search/search-overview))


Common Errors and Fixes

1. Invalid API Key

Error:
Unauthorized or invalid API key

Fix:

  • Check PINECONE_API_KEY
  • Verify environment variable is loaded
  • Do not include extra spaces

2. Index Not Found

Fix:

  • Confirm index name
  • Confirm region/environment
  • Confirm project/account

3. Dimension Mismatch

Expected dimension 1536 but received 768

Fix:

  • Check embedding model dimension
  • Create Pinecone index with matching dimension
  • Rebuild index if model changes

4. Empty Search Results

Possible causes:

  • No documents indexed
  • Wrong namespace
  • Too strict metadata filter
  • Poor chunking
  • Wrong embedding model

5. High Latency

Possible fixes:

  • Reduce top-k
  • Use better filters
  • Choose closer region
  • Optimize metadata strategy
  • Monitor embedding model latency

Best Practices

  • Match index dimension with embedding model
  • Use meaningful chunks
  • Store useful metadata
  • Use namespaces for tenant isolation
  • Use metadata filters for category and permissions
  • Keep embedding model consistent
  • Monitor search latency and retrieval quality
  • Use RAG prompts that avoid guessing
  • Protect API keys securely
  • Re-index when document content changes significantly

Production Architecture

Frontend
   |
   v
Spring Boot AI API
   |
   +-- Document Ingestion Service
   +-- Embedding Service
   +-- Pinecone VectorStore
   +-- RAG Answer Service
   +-- Monitoring Layer
   |
   v
Pinecone Cloud Vector Database

Interview Questions

Q1: What is Pinecone?

Pinecone is a managed cloud vector database used to store and search embeddings for semantic search, RAG, recommendations, and AI agents.

Q2: Why use Pinecone with Spring AI?

Spring AI provides a Pinecone VectorStore integration, making it easier to store documents, generate embeddings, and perform similarity search from Spring Boot applications.

Q3: What is a Pinecone index?

A Pinecone index is a container that stores vector records and supports vector similarity search.

Q4: Why does vector dimension matter?

The Pinecone index dimension must match the embedding model output dimension, otherwise vector operations may fail.

Q5: What is metadata filtering?

Metadata filtering limits search results to records matching specific metadata conditions, such as category, language, tenant, or source.


Advanced Interview Questions

Q1: Namespace vs metadata filter in Pinecone?

Namespaces are useful for tenant-level isolation, while metadata filters are useful for filtering records by attributes such as category, source, language, or permission.

Q2: Why use cloud vector databases?

They reduce infrastructure management and support scalable semantic search for production AI applications.

Q3: What happens if you change embedding models?

You may need to regenerate embeddings and rebuild the index because the vector space or dimension can change.

Q4: How do you secure multi-tenant vector search?

Authenticate users, select the correct namespace, apply metadata filters, and enforce authorization before search.

Q5: How does Pinecone support RAG?

Pinecone retrieves semantically relevant document chunks, which are then added to the chat model prompt for grounded answers.


Recommended Learning Path


Summary

Pinecone and other cloud vector databases help Spring AI applications perform semantic search and build scalable RAG systems without managing vector infrastructure manually.

Spring AI’s VectorStore abstraction makes it easier to integrate Pinecone into Java applications for document ingestion, embedding storage, similarity search, and grounded AI responses.

For production systems, focus on correct embedding dimensions, meaningful chunking, metadata strategy, namespace design, secure API key management, monitoring, and tenant-level authorization.

Cloud vector databases are especially useful for learning platforms, banking support assistants, e-commerce support systems, SaaS knowledge bases, customer support bots, and enterprise AI agents that need fast and reliable semantic retrieval.

About the Author

Naresh Kumar

Naresh Kumar

Senior Java Backend Engineer experienced in Banking, Payments, ISO 20022, Spring Boot, Microservices, Kafka, Docker, Kubernetes, AWS and Cloud Native Systems.

Built enterprise payment solutions, transaction processing systems, API platforms and scalable microservices used in production.

LinkedIn Profile