TiDB Vector Search for RAG: 5 Core Patterns for HTAP Semantic Retrieval
Why Pure Vector Databases Can't Sustain Production RAG
Your RAG system is live. A user searches for "refund policy from the last three months," and vector retrieval returns semantically similar but outdated documents. You want to add a time filter, but the vector database has no transactions — relational queries and vector queries live in separate systems, and data sync relies on manual scripts. Pure vector databases expose four critical flaws in production RAG:
- No transaction guarantees: Vector writes and business data updates aren't in the same transaction — consistency is a gamble
- Relational-vector disconnect: Scalar filtering only does metadata pre-screening, not true SQL + semantic retrieval fusion
- Complex data synchronization: Dual-write to business DB and vector DB, CDC pipeline maintenance is costly and laggy
- Doubled costs: Two databases mean double storage, double ops, double failure points
TiDB Vector Search solves all three with HTAP architecture: one database handles transactions, analytics, and vector retrieval — SQL and vector queries are natively fused.
Core Concepts at a Glance
| Concept | Description | Typical Implementation |
|---|---|---|
| TiDB Vector Search | TiDB native vector column type and HNSW index | TiDB 8.0+ |
| HTAP | Hybrid Transactional and Analytical Processing, row + column storage | TiDB TiFlash |
| Vector Index | Approximate nearest neighbor index structure for high-dimensional vectors | HNSW, IVF |
| HNSW | Hierarchical Navigable Small World graph-based ANN algorithm | ef_construction, max_level |
| Hybrid Query | Combined SQL scalar filtering and vector similarity query | WHERE + ORDER BY vec_cosine |
| Semantic Retrieval | Similarity search based on semantic embeddings | cosine, L2, inner product |
| Embedding Model | Neural network mapping text to dense vectors | text-embedding-3-small, bge-large |
| RAG | Retrieval-Augmented Generation, LLM + external knowledge retrieval | LangChain, LlamaIndex |
| Full-Text Search | Keyword search based on inverted index | MATCH AGAINST, FULLTEXT |
| Scalar Filtering | Applying conditional filters before/after vector retrieval | category='tech', date > '2026-01' |
Problem Analysis: 5 Major Challenges of TiDB Vector RAG
| # | Challenge | Manifestation | Impact |
|---|---|---|---|
| 1 | Vector-Relational Data Fusion | Vector and business columns in different tables, poor JOIN performance | High hybrid query latency, poor UX |
| 2 | Query Performance Optimization | HNSW parameter tuning is complex, hard to balance recall vs. latency on large datasets | Slow retrieval or insufficient recall |
| 3 | Embedding Model Selection | Significant differences in dimensions, languages, and domain adaptation | Poor embedding quality leads to low retrieval precision |
| 4 | Data Updates & Index Maintenance | Incremental writes trigger index rebuilds, large batches block online queries | Query timeouts during writes |
| 5 | Cost Control | Vector columns consume significant storage, HTAP cluster resource usage is high | Storage and compute costs exceed budget |
These five challenges are deeply interconnected: model choice determines vector dimensions and index parameters, index parameters affect query performance, data update strategies depend on index mechanisms, and cost is constrained by all the above. Production TiDB vector RAG must address these issues systematically.
Step-by-Step Implementation: 5 Core Patterns
Pattern 1: TiDB Vector Table Design & Index Creation
The first step in RAG — designing a table that fuses business fields with vector columns and creating HNSW indexes.
from sqlalchemy import create_engine, Column, Integer, String, Text, DateTime, Index
from sqlalchemy.orm import declarative_base, Session
from tidb_vector.sqlalchemy import VectorType
from datetime import datetime
Base = declarative_base()
class KnowledgeDoc(Base):
__tablename__ = "knowledge_docs"
id = Column(Integer, primary_key=True, autoincrement=True)
title = Column(String(255), nullable=False)
content = Column(Text, nullable=False)
category = Column(String(50), index=True)
source = Column(String(100))
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
embedding = Column(VectorType(1536))
__table_args__ = (
Index("idx_category_created", "category", "created_at"),
)
TIDB_CONN = "mysql+pymysql://root@127.0.0.1:4000/rag_db"
engine = create_engine(TIDB_CONN)
Base.metadata.create_all(engine)
def create_vector_index():
with engine.connect() as conn:
conn.execute("""
ALTER TABLE knowledge_docs
ADD VECTOR INDEX idx_embedding_vec
USING HNSW (embedding)
WITH (ef_construction = 128, m = 16)
""")
conn.commit()
create_vector_index()
print("Vector table and HNSW index created")
Best for: RAG knowledge base table design, multi-field hybrid query requirements.
Pattern 2: Embedding Generation & Batch Ingestion
Convert document content to vectors via embedding models and batch-write to TiDB.
import numpy as np
from openai import OpenAI
from sqlalchemy.orm import Session
class EmbeddingWriter:
def __init__(self, engine, api_key: str,
model: str = "text-embedding-3-small",
dim: int = 1536, batch_size: int = 100):
self.engine = engine
self.client = OpenAI(api_key=api_key)
self.model = model
self.dim = dim
self.batch_size = batch_size
def generate_embedding(self, text: str) -> list[float]:
response = self.client.embeddings.create(
input=text, model=self.model, dimensions=self.dim
)
return response.data[0].embedding
def batch_embed(self, texts: list[str]) -> list[list[float]]:
response = self.client.embeddings.create(
input=texts, model=self.model, dimensions=self.dim
)
return [item.embedding for item in response.data]
def write_documents(self, docs: list[dict]) -> int:
written = 0
for i in range(0, len(docs), self.batch_size):
batch = docs[i : i + self.batch_size]
texts = [d["content"] for d in batch]
embeddings = self.batch_embed(texts)
with Session(self.engine) as session:
for doc, emb in zip(batch, embeddings):
record = KnowledgeDoc(
title=doc["title"],
content=doc["content"],
category=doc.get("category", "general"),
source=doc.get("source", ""),
embedding=emb,
)
session.add(record)
session.commit()
written += len(batch)
return written
writer = EmbeddingWriter(engine, api_key="your-api-key")
docs = [
{"title": "Refund Policy", "content": "Q2 2026 refund policy: full refund within 30 days of purchase...", "category": "policy"},
{"title": "API Rate Limits", "content": "Free tier: 100 req/min, Paid tier: 1000 req/min...", "category": "tech"},
]
count = writer.write_documents(docs)
print(f"Wrote {count} documents")
Best for: Knowledge base bulk import, incremental document embedding ingestion.
Pattern 3: Semantic Retrieval with Scalar Filtering Hybrid Query
TiDB's core advantage — completing semantic retrieval and business condition filtering in a single SQL statement.
from dataclasses import dataclass
@dataclass
class SearchResult:
id: int
title: str
content: str
category: str
similarity: float
class HybridRetriever:
def __init__(self, engine, api_key: str,
model: str = "text-embedding-3-small"):
self.engine = engine
self.client = OpenAI(api_key=api_key)
self.model = model
def search(self, query: str, category: str | None = None,
date_after: str | None = None,
top_k: int = 5) -> list[SearchResult]:
query_emb = self.client.embeddings.create(
input=query, model=self.model
).data[0].embedding
emb_str = str(query_emb)
sql = """
SELECT id, title, content, category,
1 - vec_cosine_distance(embedding, %s) AS similarity
FROM knowledge_docs
WHERE 1=1
"""
params: list = [emb_str]
if category:
sql += " AND category = %s"
params.append(category)
if date_after:
sql += " AND created_at > %s"
params.append(date_after)
sql += " ORDER BY similarity DESC LIMIT %s"
params.append(top_k)
with self.engine.connect() as conn:
rows = conn.execute(sql, params).fetchall()
return [
SearchResult(
id=r[0], title=r[1], content=r[2],
category=r[3], similarity=round(r[4], 4)
)
for r in rows
]
retriever = HybridRetriever(engine, api_key="your-api-key")
results = retriever.search(
query="refund policy", category="policy", date_after="2026-01-01", top_k=3
)
for r in results:
print(f"[{r.similarity:.3f}] {r.title} ({r.category})")
Best for: Semantic search with business conditions, multi-dimensional filtered retrieval.
Pattern 4: Full-Text Search & Vector Retrieval Fusion
Dual-recall with keyword exact matching + semantic understanding, fused with RRF algorithm.
from dataclasses import dataclass
@dataclass
class FusedResult:
id: int
title: str
content: str
rrf_score: float
vector_rank: int
fulltext_rank: int
class FusedRetriever:
def __init__(self, engine, api_key: str, k: int = 60):
self.engine = engine
self.client = OpenAI(api_key=api_key)
self.k = k
def fused_search(self, query: str, top_k: int = 5) -> list[FusedResult]:
query_emb = self.client.embeddings.create(
input=query, model="text-embedding-3-small"
).data[0].embedding
emb_str = str(query_emb)
vec_sql = """
SELECT id, title, content,
ROW_NUMBER() OVER (ORDER BY vec_cosine_distance(embedding, %s)) AS vec_rank
FROM knowledge_docs
ORDER BY vec_cosine_distance(embedding, %s)
LIMIT 50
"""
ft_sql = """
SELECT id, title, content,
ROW_NUMBER() OVER (ORDER BY MATCH(content) AGAINST(%s IN NATURAL LANGUAGE MODE)) AS ft_rank
FROM knowledge_docs
WHERE MATCH(content) AGAINST(%s IN NATURAL LANGUAGE MODE)
LIMIT 50
"""
with self.engine.connect() as conn:
vec_rows = conn.execute(vec_sql, [emb_str, emb_str]).fetchall()
ft_rows = conn.execute(ft_sql, [query, query]).fetchall()
rrf_scores: dict[int, dict] = {}
for row in vec_rows:
doc_id = row[0]
rrf_scores[doc_id] = {
"title": row[1], "content": row[2],
"vec_rank": row[3], "ft_rank": 9999,
"rrf": 1.0 / (self.k + row[3]),
}
for row in ft_rows:
doc_id = row[0]
if doc_id in rrf_scores:
rrf_scores[doc_id]["ft_rank"] = row[3]
rrf_scores[doc_id]["rrf"] += 1.0 / (self.k + row[3])
else:
rrf_scores[doc_id] = {
"title": row[1], "content": row[2],
"vec_rank": 9999, "ft_rank": row[3],
"rrf": 1.0 / (self.k + row[3]),
}
sorted_results = sorted(
rrf_scores.items(), key=lambda x: x[1]["rrf"], reverse=True
)[:top_k]
return [
FusedResult(
id=doc_id, title=v["title"], content=v["content"],
rrf_score=round(v["rrf"], 5),
vector_rank=v["vec_rank"], fulltext_rank=v["ft_rank"],
)
for doc_id, v in sorted_results
]
fused = FusedRetriever(engine, api_key="your-api-key")
results = fused.fused_search("refund policy 2026", top_k=5)
for r in results:
print(f"[RRF:{r.rrf_score:.4f} V:{r.vector_rank} F:{r.fulltext_rank}] {r.title}")
Best for: Keyword + semantic dual-recall, exact matching for technical terms.
Pattern 5: End-to-End RAG Application Construction
Chain retrieval, reranking, and generation into a complete RAG pipeline.
from dataclasses import dataclass
@dataclass
class RAGConfig:
retrieval_top_k: int = 10
rerank_top_n: int = 3
max_context_tokens: int = 3000
llm_model: str = "gpt-4o-mini"
embedding_model: str = "text-embedding-3-small"
class TiDBRAGPipeline:
def __init__(self, engine, api_key: str, config: RAGConfig | None = None):
self.config = config or RAGConfig()
self.engine = engine
self.client = OpenAI(api_key=api_key)
self.retriever = HybridRetriever(
engine, api_key, model=self.config.embedding_model
)
def _rerank(self, query: str, results: list[SearchResult]) -> list[SearchResult]:
scored = []
for r in results:
overlap = sum(
1 for w in query.split() if w.lower() in r.content.lower()
) / max(len(query.split()), 1)
final_score = 0.7 * r.similarity + 0.3 * overlap
scored.append((r, final_score))
scored.sort(key=lambda x: x[1], reverse=True)
return [r for r, _ in scored[: self.config.rerank_top_n]]
def query(self, question: str, category: str | None = None) -> str:
results = self.retriever.search(
query=question, category=category,
top_k=self.config.retrieval_top_k
)
reranked = self._rerank(question, results)
context = "\n\n".join(
f"[{r.title}] ({r.category})\n{r.content}" for r in reranked
)
if len(context) > self.config.max_context_tokens * 4:
context = context[: self.config.max_context_tokens * 4]
prompt = (
"Answer the question based on the following retrieval results. "
"If the context lacks sufficient information, state it clearly.\n\n"
f"Context:\n{context}\n\nQuestion: {question}"
)
response = self.client.chat.completions.create(
model=self.config.llm_model,
messages=[{"role": "user", "content": prompt}],
max_tokens=800,
)
return response.choices[0].message.content
pipeline = TiDBRAGPipeline(engine, api_key="your-api-key")
answer = pipeline.query("What is the Q2 2026 refund policy?", category="policy")
print(answer)
Best for: Production-grade RAG Q&A systems, enterprise knowledge base assistants.
Pitfall Guide: 5 Common Traps
Trap 1: Querying vector columns without an index
❌ Wrong:
SELECT * FROM knowledge_docs
ORDER BY vec_cosine_distance(embedding, @query_emb)
LIMIT 10;
✅ Right:
ALTER TABLE knowledge_docs
ADD VECTOR INDEX idx_vec USING HNSW (embedding)
WITH (ef_construction = 128, m = 16);
-- Set ef_search at query time
SET SESSION tidb_vector_ef_search = 64;
SELECT * FROM knowledge_docs
ORDER BY vec_cosine_distance(embedding, @query_emb)
LIMIT 10;
Trap 2: Embedding dimension mismatch with index dimension
❌ Wrong:
embedding = Column(VectorType(768)) # Table defined as 768-dim
# But writing 1536-dim vectors from text-embedding-3-small
✅ Right:
EMBEDDING_DIM = 1536
class KnowledgeDoc(Base):
embedding = Column(VectorType(EMBEDDING_DIM))
# Explicitly specify dimensions when generating embeddings
response = client.embeddings.create(
input=text, model="text-embedding-3-small", dimensions=EMBEDDING_DIM
)
Trap 3: Not disabling index during bulk writes
❌ Wrong:
for doc in large_doc_list:
session.add(KnowledgeDoc(embedding=doc["emb"]))
session.commit() # Each write triggers index update
✅ Right:
# Disable vector index first, rebuild after bulk load
with engine.connect() as conn:
conn.execute("ALTER TABLE knowledge_docs ALTER INDEX idx_vec INVISIBLE")
conn.commit()
bulk_write(large_doc_list)
with engine.connect() as conn:
conn.execute("ALTER TABLE knowledge_docs ALTER INDEX idx_vec VISIBLE")
conn.commit()
Trap 4: Applying scalar filtering after vector ranking
❌ Wrong:
SELECT * FROM knowledge_docs
ORDER BY vec_cosine_distance(embedding, @emb)
LIMIT 100;
-- Then filter category in application code
✅ Right:
SELECT * FROM knowledge_docs
WHERE category = 'policy' AND created_at > '2026-01-01'
ORDER BY vec_cosine_distance(embedding, @emb)
LIMIT 10;
Trap 5: Feeding untruncated RAG context directly to LLM
❌ Wrong:
context = "\n".join(r.content for r in all_results)
prompt = f"Context: {context}\nQuestion: {question}"
✅ Right:
MAX_CTX_TOKENS = 3000
def truncate_context(results: list[SearchResult]) -> str:
parts, total = [], 0
for r in results:
est_tokens = len(r.content) // 4
if total + est_tokens > MAX_CTX_TOKENS:
break
parts.append(f"[{r.title}]\n{r.content}")
total += est_tokens
return "\n\n".join(parts)
Error Troubleshooting: 10 Common Errors
| # | Error Message | Cause | Solution |
|---|---|---|---|
| 1 | Vector dimension mismatch: expected 1536 got 768 |
Embedding dimension doesn't match table definition | Unify embedding model dimensions or modify VectorType parameter |
| 2 | HNSW index build OOM |
ef_construction too large or data exceeds memory | Lower ef_construction to 64, build index in batches |
| 3 | vec_cosine_distance function not found |
TiDB version below 8.0 or vector feature not enabled | Upgrade TiDB to 8.0+, confirm vector plugin is loaded |
| 4 | FULLTEXT index not found on column content |
Full-text index not created | ALTER TABLE knowledge_docs ADD FULLTEXT INDEX ft_content(content) |
| 5 | Query timeout during vector search |
HNSW ef_search too large or data exceeds expectations | Lower ef_search, add scalar filtering to narrow scope |
| 6 | Duplicate entry for key 'PRIMARY' |
ID conflict during bulk writes | Use auto_increment or specify ID ranges explicitly |
| 7 | Embedding API rate limit exceeded |
Embedding API call frequency exceeds limit | Add request intervals, use batch API, implement backoff retry |
| 8 | TiDB connection pool exhausted |
Too many concurrent queries, connections not released | Configure pool size, use with statements to ensure connection return |
| 9 | Index invisible after bulk load |
Forgot to restore index visibility after bulk write | Execute ALTER INDEX idx_vec VISIBLE and verify |
| 10 | RRF fusion returns empty results |
Both vector and full-text retrieval return no matches | Relax similarity threshold, add fallback retrieval strategy |
Advanced Optimization: 4 Key Techniques
1. HNSW Parameter Auto-Tuning
class HNSWAutoTuner:
def __init__(self, engine):
self.engine = engine
def recommend_params(self, row_count: int,
recall_target: float = 0.95) -> dict:
if row_count < 100_000:
return {"ef_construction": 64, "m": 8, "ef_search": 40}
elif row_count < 1_000_000:
return {"ef_construction": 128, "m": 16, "ef_search": 64}
else:
return {"ef_construction": 256, "m": 24, "ef_search": 128}
def apply(self, table: str, params: dict):
with self.engine.connect() as conn:
conn.execute(
f"ALTER TABLE {table} ADD VECTOR INDEX idx_vec "
f"USING HNSW (embedding) WITH "
f"(ef_construction={params['ef_construction']}, m={params['m']})"
)
conn.execute(
f"SET SESSION tidb_vector_ef_search = {params['ef_search']}"
)
conn.commit()
2. Embedding Cache to Avoid Redundant Computation
import hashlib
import json
class EmbeddingCache:
def __init__(self, engine, ttl_hours: int = 24):
self.engine = engine
self.ttl = ttl_hours
self._local: dict[str, list[float]] = {}
def _cache_key(self, text: str, model: str) -> str:
raw = f"{model}:{text}"
return hashlib.md5(raw.encode()).hexdigest()
def get_or_compute(self, text: str, model: str,
compute_fn) -> list[float]:
key = self._cache_key(text, model)
if key in self._local:
return self._local[key]
emb = compute_fn(text, model)
self._local[key] = emb
return emb
3. Query Router: Auto-Select Retrieval Strategy
class QueryRouter:
def route(self, query: str) -> str:
has_exact_terms = any(
kw in query.lower() for kw in ["id", "code", "version", "number"]
)
is_recent = any(
kw in query.lower() for kw in ["latest", "recent", "today", "this week"]
)
if has_exact_terms and is_recent:
return "fused"
elif has_exact_terms:
return "fulltext"
elif is_recent:
return "hybrid"
else:
return "vector"
4. Vector Search Performance Monitoring
class VectorSearchMonitor:
def __init__(self, engine):
self.engine = engine
def get_index_stats(self, table: str) -> dict:
with self.engine.connect() as conn:
count = conn.execute(
f"SELECT COUNT(*) FROM {table}"
).scalar()
null_embs = conn.execute(
f"SELECT COUNT(*) FROM {table} WHERE embedding IS NULL"
).scalar()
return {
"total_rows": count,
"null_embeddings": null_embs,
"coverage": round((count - null_embs) / count, 4) if count else 0,
}
Comparison: 4 Vector Retrieval Solutions
| Dimension | TiDB Vector | Pinecone | Milvus | pgvector |
|---|---|---|---|---|
| SQL+Vector Fusion | Native | Not supported | Scalar filtering | Native |
| ACID Transactions | Full ACID | None | Limited | Full ACID |
| HTAP Analytics | Row+Column storage | None | None | None |
| Horizontal Scaling | Auto-scaling | Auto | Manual sharding | Requires Citus |
| Index Types | HNSW | Auto | IVF/HNSW/DiskANN | HNSW/IVFFlat |
| Max Dimensions | 16384 | 2048 | 32768 | 2000 (HNSW) |
| Full-Text Search | Native FULLTEXT | None | None | tsvector |
| Deployment | Self-hosted/Serverless | Cloud only | Self-hosted/Cloud | Self-hosted |
| Ops Complexity | Low (managed) | Very low | High | Medium |
| Best For | RAG+Business fusion | Pure vector SaaS | Large-scale vectors | PG ecosystem |
TiDB Vector's core advantage: one database handles SQL + vector + full-text + transactions — no data sync pipelines needed.
Summary & Outlook
TiDB Vector Search is becoming a key infrastructure for RAG production deployment in 2026:
- Serverless Vectors: TiDB Cloud Serverless bills by vector query volume — zero-ops RAG launch
- Multimodal Vectors: Image and audio embeddings stored alongside text, cross-modal retrieval
- Streaming Embedding Updates: CDC-driven automatic embedding recomputation, data changes reflected in vector index in real-time
- Adaptive Indexing: Auto-adjusts HNSW parameters based on query patterns, no manual tuning
- RAG Evaluation Standardization: Built-in retrieval quality metrics in TiDB, end-to-end RAG evaluation out of the box
Choosing TiDB vector RAG: if your RAG needs SQL filtering, transaction guarantees, or business data fusion, TiDB is the only one-stop solution. For pure vector retrieval, consider Pinecone; for ultra-large-scale vector stores, consider Milvus. Start with TiDB Cloud Serverless free tier to validate.
Recommended Online Tools
- JSON Formatter — Format embedding vectors and retrieval result JSON data
- Hash Calculator — Compute document fingerprint and embedding cache MD5/SHA hashes
- Curl to Code — Convert TiDB API and embedding interface debug curls to Python code
Try these browser-local tools — no sign-up required →