PostgreSQL + pgvector Vector Search: Building RAG Retrieval Without a Vector Database in 2026

编程语言

PostgreSQL + pgvector Vector Search: Building RAG Retrieval Without a Vector Database in 2026

When you think of RAG, do you immediately reach for Pinecone, Milvus, or Weaviate? Introducing a whole new database system just for simple semantic search doubles your operational costs and makes data synchronization a headache. In 2026, PostgreSQL's pgvector extension is mature enough — your existing PostgreSQL can handle vector search without needing a separate vector database.


Background

Vector Search Basics

The core of vector search is transforming unstructured data like text and images into high-dimensional vectors (Embeddings), then finding semantically similar results by calculating distances between vectors (cosine similarity, Euclidean distance, etc.).

Concept Description Example
Embedding Maps data to vectors text-embedding-3-large outputs 3072-dimensional vectors
Cosine Similarity Cosine of the angle between vectors, range [-1,1] 1 means identical
Euclidean Distance Straight-line distance between vectors 0 means identical
Inner Product Dot product of vectors Suitable for normalized vectors
ANN Search Approximate Nearest Neighbor, trades accuracy for speed HNSW, IVFFlat

Index Types Supported by pgvector

Index Algorithm Build Speed Query Speed Recall Memory Usage Use Case
HNSW Graph index Slow Fast High High Small-medium scale, high recall
IVFFlat Inverted file index Fast Medium Medium Low Large scale, acceptable accuracy loss

Problem Analysis

Why Not Use a Dedicated Vector Database?

  1. Operational Cost: Adding a new database means extra work for backups, monitoring, upgrades, and high availability
  2. Data Consistency: Business data in PostgreSQL, vector data in a vector database — dual-write consistency is hard to guarantee
  3. Query Capability: Vector databases have limited filtering capabilities and can't do complex join queries
  4. Migration Risk: The vector database market is fragmented — Pinecone/Milvus/Weaviate APIs are incompatible

pgvector lets you do vector search + business filtering + join queries all within PostgreSQL — a one-stop solution.


Step-by-Step Guide

Step 1: Install the pgvector Extension

-- Connect to PostgreSQL
psql -U postgres -d mydb

-- Install the extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify version
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- 0.8.0+

Using Docker:

# docker-compose.yml
services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Step 2: Create a Vector Table

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    source TEXT,
    category TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    embedding vector(1536)
);

-- Create HNSW index (recommended for high-recall scenarios)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Create IVFFlat index (recommended for large-scale data)
-- CREATE INDEX ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);

Step 3: Generate and Store Vectors

import psycopg2
from openai import OpenAI

client = OpenAI()
conn = psycopg2.connect("dbname=mydb user=postgres password=password")
cur = conn.cursor()

def generate_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding

def insert_document(title: str, content: str, source: str, category: str):
    full_text = f"{title}\n{content}"
    embedding = generate_embedding(full_text)

    cur.execute(
        "INSERT INTO documents (title, content, source, category, embedding) VALUES (%s, %s, %s, %s, %s)",
        (title, content, source, category, str(embedding)),
    )
    conn.commit()

insert_document(
    "K8s Gateway API Guide",
    "Gateway API is the next-generation traffic management standard for Kubernetes...",
    "toolsku.dev/blog",
    "DevOps",
)

Step 4: Vector Search Queries

-- Cosine similarity search
SELECT id, title, content, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE 1 - (embedding <=> $1) > 0.7
ORDER BY embedding <=> $1
LIMIT 10;

-- Vector search with filtering
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE category = 'DevOps'
  AND 1 - (embedding <=> $1) > 0.7
ORDER BY embedding <=> $1
LIMIT 10;

Step 5: Hybrid Query (Vector + Full-Text)

-- Create full-text search index
ALTER TABLE documents ADD COLUMN tsv tsvector
    GENERATED ALWAYS AS (to_tsvector('simple', coalesce(title, '') || ' ' || coalesce(content, ''))) STORED;

CREATE INDEX ON documents USING gin(tsv);

-- Hybrid query: vector search + full-text search + RRF fusion
WITH vector_results AS (
    SELECT id, title,
           ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS vector_rank,
           1 - (embedding <=> $1) AS vector_score
    FROM documents
    ORDER BY embedding <=> $1
    LIMIT 50
),
text_results AS (
    SELECT id, title,
           ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, plainto_tsquery('simple', $2)) DESC) AS text_rank,
           ts_rank(tsv, plainto_tsquery('simple', $2)) AS text_score
    FROM documents
    WHERE tsv @@ plainto_tsquery('simple', $2)
    LIMIT 50
),
combined AS (
    SELECT
        COALESCE(v.id, t.id) AS id,
        COALESCE(v.title, t.title) AS title,
        COALESCE(1.0 / (60 + v.vector_rank), 0) +
        COALESCE(1.0 / (60 + t.text_rank), 0) AS rrf_score
    FROM vector_results v
    FULL OUTER JOIN text_results t ON v.id = t.id
)
SELECT id, title, rrf_score
FROM combined
ORDER BY rrf_score DESC
LIMIT 10;

Complete Code: RAG Retrieval System

import psycopg2
from openai import OpenAI
from dataclasses import dataclass

@dataclass
class SearchResult:
    id: int
    title: str
    content: str
    score: float
    source: str

class PgVectorRAG:
    def __init__(self, db_url: str, openai_api_key: str):
        self.conn = psycopg2.connect(db_url)
        self.client = OpenAI(api_key=openai_api_key)

    def embed(self, text: str) -> list[float]:
        response = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text,
        )
        return response.data[0].embedding

    def search(
        self,
        query: str,
        top_k: int = 10,
        threshold: float = 0.6,
        category: str | None = None,
        use_hybrid: bool = True,
    ) -> list[SearchResult]:
        query_embedding = self.embed(query)
        embedding_str = str(query_embedding)

        if use_hybrid:
            return self._hybrid_search(embedding_str, query, top_k, category)
        return self._vector_search(embedding_str, top_k, threshold, category)

    def _vector_search(
        self,
        embedding_str: str,
        top_k: int,
        threshold: float,
        category: str | None,
    ) -> list[SearchResult]:
        sql = """
            SELECT id, title, content, source,
                   1 - (embedding <=> %s::vector) AS similarity
            FROM documents
            WHERE 1 - (embedding <=> %s::vector) > %s
        """
        params: list = [embedding_str, embedding_str, threshold]

        if category:
            sql += " AND category = %s"
            params.append(category)

        sql += " ORDER BY embedding <=> %s::vector LIMIT %s"
        params.extend([embedding_str, top_k])

        cur = self.conn.cursor()
        cur.execute(sql, params)
        results = cur.fetchall()

        return [
            SearchResult(id=r[0], title=r[1], content=r[2], source=r[3], score=r[4])
            for r in results
        ]

    def _hybrid_search(
        self,
        embedding_str: str,
        query_text: str,
        top_k: int,
        category: str | None,
    ) -> list[SearchResult]:
        category_filter = "AND category = %s" if category else ""
        params: list = [embedding_str, embedding_str]

        if category:
            params.append(category)

        params.extend([embedding_str, query_text])

        if category:
            params.append(category)

        params.extend([embedding_str, top_k])

        sql = f"""
        WITH vector_results AS (
            SELECT id, title, content, source,
                   ROW_NUMBER() OVER (ORDER BY embedding <=> %s::vector) AS v_rank
            FROM documents
            WHERE 1 - (embedding <=> %s::vector) > 0.5
            {category_filter}
            ORDER BY embedding <=> %s::vector
            LIMIT 50
        ),
        text_results AS (
            SELECT id, title, content, source,
                   ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, plainto_tsquery('simple', %s)) DESC) AS t_rank
            FROM documents
            WHERE tsv @@ plainto_tsquery('simple', %s)
            {category_filter}
            LIMIT 50
        ),
        combined AS (
            SELECT
                COALESCE(v.id, t.id) AS id,
                COALESCE(v.title, t.title) AS title,
                COALESCE(v.content, t.content) AS content,
                COALESCE(v.source, t.source) AS source,
                COALESCE(1.0 / (60 + v.v_rank), 0) +
                COALESCE(1.0 / (60 + t.t_rank), 0) AS rrf_score
            FROM vector_results v
            FULL OUTER JOIN text_results t ON v.id = t.id
        )
        SELECT id, title, content, source, rrf_score
        FROM combined
        ORDER BY rrf_score DESC
        LIMIT %s
        """

        cur = self.conn.cursor()
        cur.execute(sql, params)
        results = cur.fetchall()

        return [
            SearchResult(id=r[0], title=r[1], content=r[2], source=r[3], score=r[4])
            for r in results
        ]

    def generate_answer(self, query: str, top_k: int = 5) -> str:
        results = self.search(query, top_k=top_k, use_hybrid=True)
        context = "\n\n".join(f"[{r.title}]\n{r.content}" for r in results)

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Answer the question based on the following reference materials. If the reference materials don't contain relevant information, please say so.\n\n" + context},
                {"role": "user", "content": query},
            ],
        )
        return response.choices[0].message.content


if __name__ == "__main__":
    rag = PgVectorRAG(
        db_url="dbname=mydb user=postgres password=password",
        openai_api_key="sk-xxx",
    )

    answer = rag.generate_answer("How to configure K8s Gateway API?")
    print(answer)

Common Pitfalls

Pitfall 1: HNSW Index Build Takes Too Long

Symptom: Building an HNSW index on millions of rows takes hours, during which the table is locked.

Solution: Use the CONCURRENTLY option (pgvector 0.7+) or build during off-peak hours. Lowering the ef_construction parameter (default 64→32) speeds up the build but slightly reduces recall. Build in batches and merge.

Pitfall 2: Vector Dimension Mismatch Causes Insert Failure

Symptom: ERROR: expected 1536 dimensions, not 768.

Solution: Ensure the Embedding model output dimensions match the table's vector(N) definition. text-embedding-3-small outputs 1536 dimensions, text-embedding-3-large outputs 3072 dimensions. Modify the table definition: ALTER TABLE documents ALTER COLUMN embedding TYPE vector(768).

Pitfall 3: IVFFlat Index Recall Drops After Data Updates

Symptom: After adding large amounts of new data, search results become noticeably inaccurate.

Solution: IVFFlat's cluster centers are determined at index creation time; new data causes drift. Rebuild the index periodically: REINDEX INDEX documents_embedding_idx. Or use HNSW index instead (incremental-friendly).

Pitfall 4: Hybrid Query Performance Is Poor

Symptom: Vector + full-text hybrid query takes over 1 second.

Solution: Ensure both searches use their respective indexes — verify with EXPLAIN ANALYZE. Limit the candidate set size for each (50 each); RRF fusion is fast on small datasets. Avoid SELECT * in hybrid queries.

Pitfall 5: pgvector Uses Too Much Memory

Symptom: HNSW index memory usage is 2-3x the raw data size.

Solution: Adjust the m parameter (default 16→8) to reduce graph connectivity, lowering memory usage at the cost of some recall. Use IVFFlat instead of HNSW. Consider using half-precision vectors: vector(1536)halfvec(1536) (pgvector 0.7+).


Error Troubleshooting

# Error Message Cause Solution
1 extension "vector" not available pgvector not installed Use pgvector/pgvector Docker image or compile from source
2 expected 1536 dimensions, not 768 Vector dimension mismatch Check Embedding model, modify table dimension definition
3 operator does not exist: vector <=> unknown Missing type cast Use %s::vector for explicit cast
4 index row size exceeds maximum HNSW index row too large Lower m parameter, or use halfvec
5 cannot create index concurrently pgvector version doesn't support it Upgrade to 0.7+ or use non-concurrent mode
6 IVFFlat recall is low after inserts New data deviates from cluster centers Periodically REINDEX or switch to HNSW
7 out of memory during index build Insufficient memory for index build Increase work_mem, build in batches
8 invalid input syntax for type vector Invalid vector format Ensure [0.1, 0.2, ...] format
9 permission denied for schema public User lacks extension creation permission Use superuser to run CREATE EXTENSION
10 query timeout on hybrid search Hybrid query too slow Limit candidate set size, add indexes, check EXPLAIN

Advanced Optimization

1. Half-Precision Vectors for Storage Savings

-- Use halfvec (half-precision float), storage halved
ALTER TABLE documents ALTER COLUMN embedding TYPE halfvec(1536);

-- Create HNSW index
CREATE INDEX ON documents
USING hnsw (embedding halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE TABLE documents_partitioned (
    LIKE documents INCLUDING DEFAULTS
) PARTITION BY LIST (category);

CREATE TABLE documents_devops PARTITION OF documents_partitioned FOR VALUES IN ('DevOps');
CREATE TABLE documents_frontend PARTITION OF documents_partitioned FOR VALUES IN ('Frontend Engineering');
CREATE TABLE documents_ai PARTITION OF documents_partitioned FOR VALUES IN ('AI & Big Data');

3. Adjusting HNSW Search Precision at Query Time

-- Increase ef_search for higher recall (default 40)
SET hnsw.ef_search = 200;

-- Decrease ef_search for faster queries
SET hnsw.ef_search = 20;

4. Batch Vector Generation Optimization

import psycopg2
from openai import OpenAI
import math

client = OpenAI()

def batch_embed(texts: list[str], batch_size: int = 100) -> list[list[float]]:
    embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=batch,
        )
        embeddings.extend([d.embedding for d in response.data])
    return embeddings

def bulk_insert_documents(docs: list[dict]):
    texts = [f"{d['title']}\n{d['content']}" for d in docs]
    embeddings = batch_embed(texts)

    cur = conn.cursor()
    for doc, emb in zip(docs, embeddings):
        cur.execute(
            "INSERT INTO documents (title, content, source, category, embedding) VALUES (%s, %s, %s, %s, %s)",
            (doc['title'], doc['content'], doc['source'], doc['category'], str(emb)),
        )
    conn.commit()

Comparison

Dimension pgvector Pinecone Milvus Weaviate Qdrant
Deployment PostgreSQL extension Fully managed Self-hosted/managed Self-hosted/managed Self-hosted/managed
Operational Cost Very low (reuses PG) High (per-query pricing) High High Medium
Hybrid Query Native SQL Limited filtering Limited filtering GraphQL filtering Filters
Transaction Support ACID None None None None
Max Dimensions 16000 20000 32768 65535 65535
Index Types HNSW/IVFFlat Proprietary HNSW/IVF/DiskANN HNSW HNSW
Horizontal Scaling Requires Citus Automatic Automatic Automatic Automatic
Suitable Scale Million-level Billion-level Billion-level Hundred-million-level Hundred-million-level
Data Consistency Strong consistency Eventual consistency Eventual consistency Eventual consistency Eventual consistency

Summary & Outlook

Summary: For most RAG application scenarios, PostgreSQL + pgvector is completely sufficient. Its core advantages are zero additional operational cost, native SQL hybrid queries, and ACID transaction guarantees. HNSW index performance is excellent at the million-level, and hybrid queries (vector + full-text + RRF) deliver far higher recall than pure vector search. Only when data reaches the billion-level or horizontal scaling is needed should you consider a dedicated vector database. Start with pgvector and evolve as needed.


Try these browser-local tools — no sign-up required →

#PostgreSQL#pgvector#向量搜索#语义检索#RAG#HNSW#IVFFlat#混合查询