PostgreSQL + pgvector Vector Search: Building RAG Retrieval Without a Vector Database in 2026
PostgreSQL + pgvector Vector Search: Building RAG Retrieval Without a Vector Database in 2026
When you think of RAG, do you immediately reach for Pinecone, Milvus, or Weaviate? Introducing a whole new database system just for simple semantic search doubles your operational costs and makes data synchronization a headache. In 2026, PostgreSQL's pgvector extension is mature enough — your existing PostgreSQL can handle vector search without needing a separate vector database.
Background
Vector Search Basics
The core of vector search is transforming unstructured data like text and images into high-dimensional vectors (Embeddings), then finding semantically similar results by calculating distances between vectors (cosine similarity, Euclidean distance, etc.).
| Concept | Description | Example |
|---|---|---|
| Embedding | Maps data to vectors | text-embedding-3-large outputs 3072-dimensional vectors |
| Cosine Similarity | Cosine of the angle between vectors, range [-1,1] | 1 means identical |
| Euclidean Distance | Straight-line distance between vectors | 0 means identical |
| Inner Product | Dot product of vectors | Suitable for normalized vectors |
| ANN Search | Approximate Nearest Neighbor, trades accuracy for speed | HNSW, IVFFlat |
Index Types Supported by pgvector
| Index | Algorithm | Build Speed | Query Speed | Recall | Memory Usage | Use Case |
|---|---|---|---|---|---|---|
| HNSW | Graph index | Slow | Fast | High | High | Small-medium scale, high recall |
| IVFFlat | Inverted file index | Fast | Medium | Medium | Low | Large scale, acceptable accuracy loss |
Problem Analysis
Why Not Use a Dedicated Vector Database?
- Operational Cost: Adding a new database means extra work for backups, monitoring, upgrades, and high availability
- Data Consistency: Business data in PostgreSQL, vector data in a vector database — dual-write consistency is hard to guarantee
- Query Capability: Vector databases have limited filtering capabilities and can't do complex join queries
- Migration Risk: The vector database market is fragmented — Pinecone/Milvus/Weaviate APIs are incompatible
pgvector lets you do vector search + business filtering + join queries all within PostgreSQL — a one-stop solution.
Step-by-Step Guide
Step 1: Install the pgvector Extension
-- Connect to PostgreSQL
psql -U postgres -d mydb
-- Install the extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify version
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- 0.8.0+
Using Docker:
# docker-compose.yml
services:
postgres:
image: pgvector/pgvector:pg16
environment:
POSTGRES_DB: mydb
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Step 2: Create a Vector Table
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
source TEXT,
category TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
embedding vector(1536)
);
-- Create HNSW index (recommended for high-recall scenarios)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Create IVFFlat index (recommended for large-scale data)
-- CREATE INDEX ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);
Step 3: Generate and Store Vectors
import psycopg2
from openai import OpenAI
client = OpenAI()
conn = psycopg2.connect("dbname=mydb user=postgres password=password")
cur = conn.cursor()
def generate_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return response.data[0].embedding
def insert_document(title: str, content: str, source: str, category: str):
full_text = f"{title}\n{content}"
embedding = generate_embedding(full_text)
cur.execute(
"INSERT INTO documents (title, content, source, category, embedding) VALUES (%s, %s, %s, %s, %s)",
(title, content, source, category, str(embedding)),
)
conn.commit()
insert_document(
"K8s Gateway API Guide",
"Gateway API is the next-generation traffic management standard for Kubernetes...",
"toolsku.dev/blog",
"DevOps",
)
Step 4: Vector Search Queries
-- Cosine similarity search
SELECT id, title, content, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE 1 - (embedding <=> $1) > 0.7
ORDER BY embedding <=> $1
LIMIT 10;
-- Vector search with filtering
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE category = 'DevOps'
AND 1 - (embedding <=> $1) > 0.7
ORDER BY embedding <=> $1
LIMIT 10;
Step 5: Hybrid Query (Vector + Full-Text)
-- Create full-text search index
ALTER TABLE documents ADD COLUMN tsv tsvector
GENERATED ALWAYS AS (to_tsvector('simple', coalesce(title, '') || ' ' || coalesce(content, ''))) STORED;
CREATE INDEX ON documents USING gin(tsv);
-- Hybrid query: vector search + full-text search + RRF fusion
WITH vector_results AS (
SELECT id, title,
ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS vector_rank,
1 - (embedding <=> $1) AS vector_score
FROM documents
ORDER BY embedding <=> $1
LIMIT 50
),
text_results AS (
SELECT id, title,
ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, plainto_tsquery('simple', $2)) DESC) AS text_rank,
ts_rank(tsv, plainto_tsquery('simple', $2)) AS text_score
FROM documents
WHERE tsv @@ plainto_tsquery('simple', $2)
LIMIT 50
),
combined AS (
SELECT
COALESCE(v.id, t.id) AS id,
COALESCE(v.title, t.title) AS title,
COALESCE(1.0 / (60 + v.vector_rank), 0) +
COALESCE(1.0 / (60 + t.text_rank), 0) AS rrf_score
FROM vector_results v
FULL OUTER JOIN text_results t ON v.id = t.id
)
SELECT id, title, rrf_score
FROM combined
ORDER BY rrf_score DESC
LIMIT 10;
Complete Code: RAG Retrieval System
import psycopg2
from openai import OpenAI
from dataclasses import dataclass
@dataclass
class SearchResult:
id: int
title: str
content: str
score: float
source: str
class PgVectorRAG:
def __init__(self, db_url: str, openai_api_key: str):
self.conn = psycopg2.connect(db_url)
self.client = OpenAI(api_key=openai_api_key)
def embed(self, text: str) -> list[float]:
response = self.client.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return response.data[0].embedding
def search(
self,
query: str,
top_k: int = 10,
threshold: float = 0.6,
category: str | None = None,
use_hybrid: bool = True,
) -> list[SearchResult]:
query_embedding = self.embed(query)
embedding_str = str(query_embedding)
if use_hybrid:
return self._hybrid_search(embedding_str, query, top_k, category)
return self._vector_search(embedding_str, top_k, threshold, category)
def _vector_search(
self,
embedding_str: str,
top_k: int,
threshold: float,
category: str | None,
) -> list[SearchResult]:
sql = """
SELECT id, title, content, source,
1 - (embedding <=> %s::vector) AS similarity
FROM documents
WHERE 1 - (embedding <=> %s::vector) > %s
"""
params: list = [embedding_str, embedding_str, threshold]
if category:
sql += " AND category = %s"
params.append(category)
sql += " ORDER BY embedding <=> %s::vector LIMIT %s"
params.extend([embedding_str, top_k])
cur = self.conn.cursor()
cur.execute(sql, params)
results = cur.fetchall()
return [
SearchResult(id=r[0], title=r[1], content=r[2], source=r[3], score=r[4])
for r in results
]
def _hybrid_search(
self,
embedding_str: str,
query_text: str,
top_k: int,
category: str | None,
) -> list[SearchResult]:
category_filter = "AND category = %s" if category else ""
params: list = [embedding_str, embedding_str]
if category:
params.append(category)
params.extend([embedding_str, query_text])
if category:
params.append(category)
params.extend([embedding_str, top_k])
sql = f"""
WITH vector_results AS (
SELECT id, title, content, source,
ROW_NUMBER() OVER (ORDER BY embedding <=> %s::vector) AS v_rank
FROM documents
WHERE 1 - (embedding <=> %s::vector) > 0.5
{category_filter}
ORDER BY embedding <=> %s::vector
LIMIT 50
),
text_results AS (
SELECT id, title, content, source,
ROW_NUMBER() OVER (ORDER BY ts_rank(tsv, plainto_tsquery('simple', %s)) DESC) AS t_rank
FROM documents
WHERE tsv @@ plainto_tsquery('simple', %s)
{category_filter}
LIMIT 50
),
combined AS (
SELECT
COALESCE(v.id, t.id) AS id,
COALESCE(v.title, t.title) AS title,
COALESCE(v.content, t.content) AS content,
COALESCE(v.source, t.source) AS source,
COALESCE(1.0 / (60 + v.v_rank), 0) +
COALESCE(1.0 / (60 + t.t_rank), 0) AS rrf_score
FROM vector_results v
FULL OUTER JOIN text_results t ON v.id = t.id
)
SELECT id, title, content, source, rrf_score
FROM combined
ORDER BY rrf_score DESC
LIMIT %s
"""
cur = self.conn.cursor()
cur.execute(sql, params)
results = cur.fetchall()
return [
SearchResult(id=r[0], title=r[1], content=r[2], source=r[3], score=r[4])
for r in results
]
def generate_answer(self, query: str, top_k: int = 5) -> str:
results = self.search(query, top_k=top_k, use_hybrid=True)
context = "\n\n".join(f"[{r.title}]\n{r.content}" for r in results)
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Answer the question based on the following reference materials. If the reference materials don't contain relevant information, please say so.\n\n" + context},
{"role": "user", "content": query},
],
)
return response.choices[0].message.content
if __name__ == "__main__":
rag = PgVectorRAG(
db_url="dbname=mydb user=postgres password=password",
openai_api_key="sk-xxx",
)
answer = rag.generate_answer("How to configure K8s Gateway API?")
print(answer)
Common Pitfalls
Pitfall 1: HNSW Index Build Takes Too Long
Symptom: Building an HNSW index on millions of rows takes hours, during which the table is locked.
Solution: Use the CONCURRENTLY option (pgvector 0.7+) or build during off-peak hours. Lowering the ef_construction parameter (default 64→32) speeds up the build but slightly reduces recall. Build in batches and merge.
Pitfall 2: Vector Dimension Mismatch Causes Insert Failure
Symptom: ERROR: expected 1536 dimensions, not 768.
Solution: Ensure the Embedding model output dimensions match the table's vector(N) definition. text-embedding-3-small outputs 1536 dimensions, text-embedding-3-large outputs 3072 dimensions. Modify the table definition: ALTER TABLE documents ALTER COLUMN embedding TYPE vector(768).
Pitfall 3: IVFFlat Index Recall Drops After Data Updates
Symptom: After adding large amounts of new data, search results become noticeably inaccurate.
Solution: IVFFlat's cluster centers are determined at index creation time; new data causes drift. Rebuild the index periodically: REINDEX INDEX documents_embedding_idx. Or use HNSW index instead (incremental-friendly).
Pitfall 4: Hybrid Query Performance Is Poor
Symptom: Vector + full-text hybrid query takes over 1 second.
Solution: Ensure both searches use their respective indexes — verify with EXPLAIN ANALYZE. Limit the candidate set size for each (50 each); RRF fusion is fast on small datasets. Avoid SELECT * in hybrid queries.
Pitfall 5: pgvector Uses Too Much Memory
Symptom: HNSW index memory usage is 2-3x the raw data size.
Solution: Adjust the m parameter (default 16→8) to reduce graph connectivity, lowering memory usage at the cost of some recall. Use IVFFlat instead of HNSW. Consider using half-precision vectors: vector(1536) → halfvec(1536) (pgvector 0.7+).
Error Troubleshooting
| # | Error Message | Cause | Solution |
|---|---|---|---|
| 1 | extension "vector" not available |
pgvector not installed | Use pgvector/pgvector Docker image or compile from source |
| 2 | expected 1536 dimensions, not 768 |
Vector dimension mismatch | Check Embedding model, modify table dimension definition |
| 3 | operator does not exist: vector <=> unknown |
Missing type cast | Use %s::vector for explicit cast |
| 4 | index row size exceeds maximum |
HNSW index row too large | Lower m parameter, or use halfvec |
| 5 | cannot create index concurrently |
pgvector version doesn't support it | Upgrade to 0.7+ or use non-concurrent mode |
| 6 | IVFFlat recall is low after inserts |
New data deviates from cluster centers | Periodically REINDEX or switch to HNSW |
| 7 | out of memory during index build |
Insufficient memory for index build | Increase work_mem, build in batches |
| 8 | invalid input syntax for type vector |
Invalid vector format | Ensure [0.1, 0.2, ...] format |
| 9 | permission denied for schema public |
User lacks extension creation permission | Use superuser to run CREATE EXTENSION |
| 10 | query timeout on hybrid search |
Hybrid query too slow | Limit candidate set size, add indexes, check EXPLAIN |
Advanced Optimization
1. Half-Precision Vectors for Storage Savings
-- Use halfvec (half-precision float), storage halved
ALTER TABLE documents ALTER COLUMN embedding TYPE halfvec(1536);
-- Create HNSW index
CREATE INDEX ON documents
USING hnsw (embedding halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64);
2. Partitioned Tables for Large-Scale Search
CREATE TABLE documents_partitioned (
LIKE documents INCLUDING DEFAULTS
) PARTITION BY LIST (category);
CREATE TABLE documents_devops PARTITION OF documents_partitioned FOR VALUES IN ('DevOps');
CREATE TABLE documents_frontend PARTITION OF documents_partitioned FOR VALUES IN ('Frontend Engineering');
CREATE TABLE documents_ai PARTITION OF documents_partitioned FOR VALUES IN ('AI & Big Data');
3. Adjusting HNSW Search Precision at Query Time
-- Increase ef_search for higher recall (default 40)
SET hnsw.ef_search = 200;
-- Decrease ef_search for faster queries
SET hnsw.ef_search = 20;
4. Batch Vector Generation Optimization
import psycopg2
from openai import OpenAI
import math
client = OpenAI()
def batch_embed(texts: list[str], batch_size: int = 100) -> list[list[float]]:
embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i + batch_size]
response = client.embeddings.create(
model="text-embedding-3-small",
input=batch,
)
embeddings.extend([d.embedding for d in response.data])
return embeddings
def bulk_insert_documents(docs: list[dict]):
texts = [f"{d['title']}\n{d['content']}" for d in docs]
embeddings = batch_embed(texts)
cur = conn.cursor()
for doc, emb in zip(docs, embeddings):
cur.execute(
"INSERT INTO documents (title, content, source, category, embedding) VALUES (%s, %s, %s, %s, %s)",
(doc['title'], doc['content'], doc['source'], doc['category'], str(emb)),
)
conn.commit()
Comparison
| Dimension | pgvector | Pinecone | Milvus | Weaviate | Qdrant |
|---|---|---|---|---|---|
| Deployment | PostgreSQL extension | Fully managed | Self-hosted/managed | Self-hosted/managed | Self-hosted/managed |
| Operational Cost | Very low (reuses PG) | High (per-query pricing) | High | High | Medium |
| Hybrid Query | Native SQL | Limited filtering | Limited filtering | GraphQL filtering | Filters |
| Transaction Support | ACID | None | None | None | None |
| Max Dimensions | 16000 | 20000 | 32768 | 65535 | 65535 |
| Index Types | HNSW/IVFFlat | Proprietary | HNSW/IVF/DiskANN | HNSW | HNSW |
| Horizontal Scaling | Requires Citus | Automatic | Automatic | Automatic | Automatic |
| Suitable Scale | Million-level | Billion-level | Billion-level | Hundred-million-level | Hundred-million-level |
| Data Consistency | Strong consistency | Eventual consistency | Eventual consistency | Eventual consistency | Eventual consistency |
Summary & Outlook
Summary: For most RAG application scenarios, PostgreSQL + pgvector is completely sufficient. Its core advantages are zero additional operational cost, native SQL hybrid queries, and ACID transaction guarantees. HNSW index performance is excellent at the million-level, and hybrid queries (vector + full-text + RRF) deliver far higher recall than pure vector search. Only when data reaches the billion-level or horizontal scaling is needed should you consider a dedicated vector database. Start with pgvector and evolve as needed.
Recommended Online Tools
- JSON Formatter (API response debugging): /en/json/format
- Base64 Encoder/Decoder (vector data processing): /en/encode/base64
- curl to Code (API testing): /en/dev/curl-to-code
Try these browser-local tools — no sign-up required →