PostgreSQL + pgvector Vector Search: Building RAG Retrieval Without a Vector Database in 2026

When you think of RAG, do you immediately reach for Pinecone, Milvus, or Weaviate? Introducing a whole new database system just for simple semantic search doubles your operational costs and makes data synchronization a headache. In 2026, PostgreSQL's pgvector extension is mature enough — your existing PostgreSQL can handle vector search without needing a separate vector database.

Background

Vector Search Basics

The core of vector search is transforming unstructured data like text and images into high-dimensional vectors (Embeddings), then finding semantically similar results by calculating distances between vectors (cosine similarity, Euclidean distance, etc.).

Concept	Description	Example
Embedding	Maps data to vectors	text-embedding-3-large outputs 3072-dimensional vectors
Cosine Similarity	Cosine of the angle between vectors, range [-1,1]	1 means identical
Euclidean Distance	Straight-line distance between vectors	0 means identical
Inner Product	Dot product of vectors	Suitable for normalized vectors
ANN Search	Approximate Nearest Neighbor, trades accuracy for speed	HNSW, IVFFlat

Index Types Supported by pgvector

Index	Algorithm	Build Speed	Query Speed	Recall	Memory Usage	Use Case
HNSW	Graph index	Slow	Fast	High	High	Small-medium scale, high recall
IVFFlat	Inverted file index	Fast	Medium	Medium	Low	Large scale, acceptable accuracy loss

Problem Analysis

Why Not Use a Dedicated Vector Database?

Operational Cost: Adding a new database means extra work for backups, monitoring, upgrades, and high availability
Data Consistency: Business data in PostgreSQL, vector data in a vector database — dual-write consistency is hard to guarantee
Query Capability: Vector databases have limited filtering capabilities and can't do complex join queries
Migration Risk: The vector database market is fragmented — Pinecone/Milvus/Weaviate APIs are incompatible

pgvector lets you do vector search + business filtering + join queries all within PostgreSQL — a one-stop solution.

Step-by-Step Guide

Step 1: Install the pgvector Extension

-- Connect to PostgreSQL
psql -U postgres -d mydb

-- Install the extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify version
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- 0.8.0+

Using Docker:

# docker-compose.yml
services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Step 2: Create a Vector Table

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    source TEXT,
    category TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    embedding vector(1536)
);

-- Create HNSW index (recommended for high-recall scenarios)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Create IVFFlat index (recommended for large-scale data)
-- CREATE INDEX ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);

Step 3: Generate and Store Vectors

import psycopg2
from openai import OpenAI

client = OpenAI()
conn = psycopg2.connect("dbname=mydb user=postgres password=password")
cur = conn.cursor()

def generate_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding

def insert_document(title: str, content: str, source: str, category: str):
    full_text = f"{title}\n{content}"
    embedding = generate_embedding(full_text)

    cur.execute(
        "INSERT INTO documents (title, content, source, category, embedding) VALUES (%s, %s, %s, %s, %s)",
        (title, content, source, category, str(embedding)),
    )
    conn.commit()

insert_document(
    "K8s Gateway API Guide",
    "Gateway API is the next-generation traffic management standard for Kubernetes...",
    "toolsku.dev/blog",
    "DevOps",
)

Step 4: Vector Search Queries

-- Cosine similarity search
SELECT id, title, content, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE 1 - (embedding <=> $1) > 0.7
ORDER BY embedding <=> $1
LIMIT 10;

-- Vector search with filtering
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE category = 'DevOps'
  AND 1 - (embedding <=> $1) > 0.7
ORDER BY embedding <=> $1
LIMIT 10;

Step 5: Hybrid Query (Vector + Full-Text)

-- Create full-text search index
ALTER TABLE documents ADD COLUMN tsv tsvector
    GENERATED ALWAYS AS (to_tsvector('simple', coalesce(title, '') || ' ' || coalesce(content, ''))) STORED;

CREATE INDEX ON documents USING gin(tsv);

-- Hybrid query: vector search + full-text search + RRF fusion
WITH vector_results AS (
    SELECT id, title,
           ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS vector_rank,
           1 - (embedding <=> $1) AS vector_score
    FROM documents
    ORDER BY embedding <=> $1
    LIMIT 50
),
text_results AS (
    SELECT id, title,
           ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, plainto_tsquery('simple', $2)) DESC) AS text_rank,
           ts_rank(tsv, plainto_tsquery('simple', $2)) AS text_score
    FROM documents
    WHERE tsv @@ plainto_tsquery('simple', $2)
    LIMIT 50
),
combined AS (
    SELECT
        COALESCE(v.id, t.id) AS id,
        COALESCE(v.title, t.title) AS title,
        COALESCE(1.0 / (60 + v.vector_rank), 0) +
        COALESCE(1.0 / (60 + t.text_rank), 0) AS rrf_score
    FROM vector_results v
    FULL OUTER JOIN text_results t ON v.id = t.id
)
SELECT id, title, rrf_score
FROM combined
ORDER BY rrf_score DESC
LIMIT 10;

Complete Code: RAG Retrieval System

import psycopg2
from openai import OpenAI
from dataclasses import dataclass

@dataclass
class SearchResult:
    id: int
    title: str
    content: str
    score: float
    source: str

class PgVectorRAG:
    def __init__(self, db_url: str, openai_api_key: str):
        self.conn = psycopg2.connect(db_url)
        self.client = OpenAI(api_key=openai_api_key)

    def embed(self, text: str) -> list[float]:
        response = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text,
        )
        return response.data[0].embedding

    def search(
        self,
        query: str,
        top_k: int = 10,
        threshold: float = 0.6,
        category: str | None = None,
        use_hybrid: bool = True,
    ) -> list[SearchResult]:
        query_embedding = self.embed(query)
        embedding_str = str(query_embedding)

        if use_hybrid:
            return self._hybrid_search(embedding_str, query, top_k, category)
        return self._vector_search(embedding_str, top_k, threshold, category)

    def _vector_search(
        self,
        embedding_str: str,
        top_k: int,
        threshold: float,
        category: str | None,
    ) -> list[SearchResult]:
        sql = """
            SELECT id, title, content, source,
                   1 - (embedding <=> %s::vector) AS similarity
            FROM documents
            WHERE 1 - (embedding <=> %s::vector) > %s
        """
        params: list = [embedding_str, embedding_str, threshold]

        if category:
            sql += " AND category = %s"
            params.append(category)

        sql += " ORDER BY embedding <=> %s::vector LIMIT %s"
        params.extend([embedding_str, top_k])

        cur = self.conn.cursor()
        cur.execute(sql, params)
        results = cur.fetchall()

        return [
            SearchResult(id=r[0], title=r[1], content=r[2], source=r[3], score=r[4])
            for r in results
        ]

    def _hybrid_search(
        self,
        embedding_str: str,
        query_text: str,
        top_k: int,
        category: str | None,
    ) -> list[SearchResult]:
        category_filter = "AND category = %s" if category else ""
        params: list = [embedding_str, embedding_str]

        if category:
            params.append(category)

        params.extend([embedding_str, query_text])

        if category:
            params.append(category)

        params.extend([embedding_str, top_k])

        sql = f"""
        WITH vector_results AS (
            SELECT id, title, content, source,
                   ROW_NUMBER() OVER (ORDER BY embedding <=> %s::vector) AS v_rank
            FROM documents
            WHERE 1 - (embedding <=> %s::vector) > 0.5
            {category_filter}
            ORDER BY embedding <=> %s::vector
            LIMIT 50
        ),
        text_results AS (
            SELECT id, title, content, source,
                   ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, plainto_tsquery('simple', %s)) DESC) AS t_rank
            FROM documents
            WHERE tsv @@ plainto_tsquery('simple', %s)
            {category_filter}
            LIMIT 50
        ),
        combined AS (
            SELECT
                COALESCE(v.id, t.id) AS id,
                COALESCE(v.title, t.title) AS title,
                COALESCE(v.content, t.content) AS content,
                COALESCE(v.source, t.source) AS source,
                COALESCE(1.0 / (60 + v.v_rank), 0) +
                COALESCE(1.0 / (60 + t.t_rank), 0) AS rrf_score
            FROM vector_results v
            FULL OUTER JOIN text_results t ON v.id = t.id
        )
        SELECT id, title, content, source, rrf_score
        FROM combined
        ORDER BY rrf_score DESC
        LIMIT %s
        """

        cur = self.conn.cursor()
        cur.execute(sql, params)
        results = cur.fetchall()

        return [
            SearchResult(id=r[0], title=r[1], content=r[2], source=r[3], score=r[4])
            for r in results
        ]

    def generate_answer(self, query: str, top_k: int = 5) -> str:
        results = self.search(query, top_k=top_k, use_hybrid=True)
        context = "\n\n".join(f"[{r.title}]\n{r.content}" for r in results)

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Answer the question based on the following reference materials. If the reference materials don't contain relevant information, please say so.\n\n" + context},
                {"role": "user", "content": query},
            ],
        )
        return response.choices[0].message.content


if __name__ == "__main__":
    rag = PgVectorRAG(
        db_url="dbname=mydb user=postgres password=password",
        openai_api_key="sk-xxx",
    )

    answer = rag.generate_answer("How to configure K8s Gateway API?")
    print(answer)

Common Pitfalls

Pitfall 1: HNSW Index Build Takes Too Long

Symptom: Building an HNSW index on millions of rows takes hours, during which the table is locked.

Solution: Use the CONCURRENTLY option (pgvector 0.7+) or build during off-peak hours. Lowering the ef_construction parameter (default 64→32) speeds up the build but slightly reduces recall. Build in batches and merge.

Pitfall 2: Vector Dimension Mismatch Causes Insert Failure

Symptom: ERROR: expected 1536 dimensions, not 768.

Solution: Ensure the Embedding model output dimensions match the table's vector(N) definition. text-embedding-3-small outputs 1536 dimensions, text-embedding-3-large outputs 3072 dimensions. Modify the table definition: ALTER TABLE documents ALTER COLUMN embedding TYPE vector(768).

Pitfall 3: IVFFlat Index Recall Drops After Data Updates

Symptom: After adding large amounts of new data, search results become noticeably inaccurate.

Solution: IVFFlat's cluster centers are determined at index creation time; new data causes drift. Rebuild the index periodically: REINDEX INDEX documents_embedding_idx. Or use HNSW index instead (incremental-friendly).

Pitfall 4: Hybrid Query Performance Is Poor

Symptom: Vector + full-text hybrid query takes over 1 second.

Solution: Ensure both searches use their respective indexes — verify with EXPLAIN ANALYZE. Limit the candidate set size for each (50 each); RRF fusion is fast on small datasets. Avoid SELECT * in hybrid queries.

Pitfall 5: pgvector Uses Too Much Memory

Symptom: HNSW index memory usage is 2-3x the raw data size.

Solution: Adjust the m parameter (default 16→8) to reduce graph connectivity, lowering memory usage at the cost of some recall. Use IVFFlat instead of HNSW. Consider using half-precision vectors: vector(1536) → halfvec(1536) (pgvector 0.7+).

Error Troubleshooting

#	Error Message	Cause	Solution
1	`extension "vector" not available`	pgvector not installed	Use pgvector/pgvector Docker image or compile from source
2	`expected 1536 dimensions, not 768`	Vector dimension mismatch	Check Embedding model, modify table dimension definition
3	`operator does not exist: vector <=> unknown`	Missing type cast	Use `%s::vector` for explicit cast
4	`index row size exceeds maximum`	HNSW index row too large	Lower m parameter, or use halfvec
5	`cannot create index concurrently`	pgvector version doesn't support it	Upgrade to 0.7+ or use non-concurrent mode
6	`IVFFlat recall is low after inserts`	New data deviates from cluster centers	Periodically REINDEX or switch to HNSW
7	`out of memory during index build`	Insufficient memory for index build	Increase work_mem, build in batches
8	`invalid input syntax for type vector`	Invalid vector format	Ensure `[0.1, 0.2, ...]` format
9	`permission denied for schema public`	User lacks extension creation permission	Use superuser to run CREATE EXTENSION
10	`query timeout on hybrid search`	Hybrid query too slow	Limit candidate set size, add indexes, check EXPLAIN

Advanced Optimization

1. Half-Precision Vectors for Storage Savings

-- Use halfvec (half-precision float), storage halved
ALTER TABLE documents ALTER COLUMN embedding TYPE halfvec(1536);

-- Create HNSW index
CREATE INDEX ON documents
USING hnsw (embedding halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64);

2. Partitioned Tables for Large-Scale Search

CREATE TABLE documents_partitioned (
    LIKE documents INCLUDING DEFAULTS
) PARTITION BY LIST (category);

CREATE TABLE documents_devops PARTITION OF documents_partitioned FOR VALUES IN ('DevOps');
CREATE TABLE documents_frontend PARTITION OF documents_partitioned FOR VALUES IN ('Frontend Engineering');
CREATE TABLE documents_ai PARTITION OF documents_partitioned FOR VALUES IN ('AI & Big Data');

3. Adjusting HNSW Search Precision at Query Time

-- Increase ef_search for higher recall (default 40)
SET hnsw.ef_search = 200;

-- Decrease ef_search for faster queries
SET hnsw.ef_search = 20;

4. Batch Vector Generation Optimization

import psycopg2
from openai import OpenAI
import math

client = OpenAI()

def batch_embed(texts: list[str], batch_size: int = 100) -> list[list[float]]:
    embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=batch,
        )
        embeddings.extend([d.embedding for d in response.data])
    return embeddings

def bulk_insert_documents(docs: list[dict]):
    texts = [f"{d['title']}\n{d['content']}" for d in docs]
    embeddings = batch_embed(texts)

    cur = conn.cursor()
    for doc, emb in zip(docs, embeddings):
        cur.execute(
            "INSERT INTO documents (title, content, source, category, embedding) VALUES (%s, %s, %s, %s, %s)",
            (doc['title'], doc['content'], doc['source'], doc['category'], str(emb)),
        )
    conn.commit()

Comparison

Dimension	pgvector	Pinecone	Milvus	Weaviate	Qdrant
Deployment	PostgreSQL extension	Fully managed	Self-hosted/managed	Self-hosted/managed	Self-hosted/managed
Operational Cost	Very low (reuses PG)	High (per-query pricing)	High	High	Medium
Hybrid Query	Native SQL	Limited filtering	Limited filtering	GraphQL filtering	Filters
Transaction Support	ACID	None	None	None	None
Max Dimensions	16000	20000	32768	65535	65535
Index Types	HNSW/IVFFlat	Proprietary	HNSW/IVF/DiskANN	HNSW	HNSW
Horizontal Scaling	Requires Citus	Automatic	Automatic	Automatic	Automatic
Suitable Scale	Million-level	Billion-level	Billion-level	Hundred-million-level	Hundred-million-level
Data Consistency	Strong consistency	Eventual consistency	Eventual consistency	Eventual consistency	Eventual consistency

Summary & Outlook

Summary: For most RAG application scenarios, PostgreSQL + pgvector is completely sufficient. Its core advantages are zero additional operational cost, native SQL hybrid queries, and ACID transaction guarantees. HNSW index performance is excellent at the million-level, and hybrid queries (vector + full-text + RRF) deliver far higher recall than pure vector search. Only when data reaches the billion-level or horizontal scaling is needed should you consider a dedicated vector database. Start with pgvector and evolve as needed.

Recommended Online Tools

JSON Formatter (API response debugging): /en/json/format
Base64 Encoder/Decoder (vector data processing): /en/encode/base64
curl to Code (API testing): /en/dev/curl-to-code