TiDB Vector Search for RAG: 5 Core Patterns for HTAP Semantic Retrieval

Why Pure Vector Databases Can't Sustain Production RAG

Your RAG system is live. A user searches for "refund policy from the last three months," and vector retrieval returns semantically similar but outdated documents. You want to add a time filter, but the vector database has no transactions — relational queries and vector queries live in separate systems, and data sync relies on manual scripts. Pure vector databases expose four critical flaws in production RAG:

No transaction guarantees: Vector writes and business data updates aren't in the same transaction — consistency is a gamble
Relational-vector disconnect: Scalar filtering only does metadata pre-screening, not true SQL + semantic retrieval fusion
Complex data synchronization: Dual-write to business DB and vector DB, CDC pipeline maintenance is costly and laggy
Doubled costs: Two databases mean double storage, double ops, double failure points

TiDB Vector Search solves all three with HTAP architecture: one database handles transactions, analytics, and vector retrieval — SQL and vector queries are natively fused.

Core Concepts at a Glance

Concept	Description	Typical Implementation
TiDB Vector Search	TiDB native vector column type and HNSW index	TiDB 8.0+
HTAP	Hybrid Transactional and Analytical Processing, row + column storage	TiDB TiFlash
Vector Index	Approximate nearest neighbor index structure for high-dimensional vectors	HNSW, IVF
HNSW	Hierarchical Navigable Small World graph-based ANN algorithm	ef_construction, max_level
Hybrid Query	Combined SQL scalar filtering and vector similarity query	WHERE + ORDER BY vec_cosine
Semantic Retrieval	Similarity search based on semantic embeddings	cosine, L2, inner product
Embedding Model	Neural network mapping text to dense vectors	text-embedding-3-small, bge-large
RAG	Retrieval-Augmented Generation, LLM + external knowledge retrieval	LangChain, LlamaIndex
Full-Text Search	Keyword search based on inverted index	MATCH AGAINST, FULLTEXT
Scalar Filtering	Applying conditional filters before/after vector retrieval	category='tech', date > '2026-01'

Problem Analysis: 5 Major Challenges of TiDB Vector RAG

#	Challenge	Manifestation	Impact
1	Vector-Relational Data Fusion	Vector and business columns in different tables, poor JOIN performance	High hybrid query latency, poor UX
2	Query Performance Optimization	HNSW parameter tuning is complex, hard to balance recall vs. latency on large datasets	Slow retrieval or insufficient recall
3	Embedding Model Selection	Significant differences in dimensions, languages, and domain adaptation	Poor embedding quality leads to low retrieval precision
4	Data Updates & Index Maintenance	Incremental writes trigger index rebuilds, large batches block online queries	Query timeouts during writes
5	Cost Control	Vector columns consume significant storage, HTAP cluster resource usage is high	Storage and compute costs exceed budget

These five challenges are deeply interconnected: model choice determines vector dimensions and index parameters, index parameters affect query performance, data update strategies depend on index mechanisms, and cost is constrained by all the above. Production TiDB vector RAG must address these issues systematically.

Step-by-Step Implementation: 5 Core Patterns

Pattern 1: TiDB Vector Table Design & Index Creation

The first step in RAG — designing a table that fuses business fields with vector columns and creating HNSW indexes.

from sqlalchemy import create_engine, Column, Integer, String, Text, DateTime, Index
from sqlalchemy.orm import declarative_base, Session
from tidb_vector.sqlalchemy import VectorType
from datetime import datetime

Base = declarative_base()


class KnowledgeDoc(Base):
    __tablename__ = "knowledge_docs"

    id = Column(Integer, primary_key=True, autoincrement=True)
    title = Column(String(255), nullable=False)
    content = Column(Text, nullable=False)
    category = Column(String(50), index=True)
    source = Column(String(100))
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    embedding = Column(VectorType(1536))

    __table_args__ = (
        Index("idx_category_created", "category", "created_at"),
    )


TIDB_CONN = "mysql+pymysql://root@127.0.0.1:4000/rag_db"
engine = create_engine(TIDB_CONN)
Base.metadata.create_all(engine)


def create_vector_index():
    with engine.connect() as conn:
        conn.execute("""
            ALTER TABLE knowledge_docs
            ADD VECTOR INDEX idx_embedding_vec
            USING HNSW (embedding)
            WITH (ef_construction = 128, m = 16)
        """)
        conn.commit()


create_vector_index()
print("Vector table and HNSW index created")

Best for: RAG knowledge base table design, multi-field hybrid query requirements.

Pattern 2: Embedding Generation & Batch Ingestion

Convert document content to vectors via embedding models and batch-write to TiDB.

import numpy as np
from openai import OpenAI
from sqlalchemy.orm import Session


class EmbeddingWriter:
    def __init__(self, engine, api_key: str,
                 model: str = "text-embedding-3-small",
                 dim: int = 1536, batch_size: int = 100):
        self.engine = engine
        self.client = OpenAI(api_key=api_key)
        self.model = model
        self.dim = dim
        self.batch_size = batch_size

    def generate_embedding(self, text: str) -> list[float]:
        response = self.client.embeddings.create(
            input=text, model=self.model, dimensions=self.dim
        )
        return response.data[0].embedding

    def batch_embed(self, texts: list[str]) -> list[list[float]]:
        response = self.client.embeddings.create(
            input=texts, model=self.model, dimensions=self.dim
        )
        return [item.embedding for item in response.data]

    def write_documents(self, docs: list[dict]) -> int:
        written = 0
        for i in range(0, len(docs), self.batch_size):
            batch = docs[i : i + self.batch_size]
            texts = [d["content"] for d in batch]
            embeddings = self.batch_embed(texts)
            with Session(self.engine) as session:
                for doc, emb in zip(batch, embeddings):
                    record = KnowledgeDoc(
                        title=doc["title"],
                        content=doc["content"],
                        category=doc.get("category", "general"),
                        source=doc.get("source", ""),
                        embedding=emb,
                    )
                    session.add(record)
                session.commit()
            written += len(batch)
        return written


writer = EmbeddingWriter(engine, api_key="your-api-key")
docs = [
    {"title": "Refund Policy", "content": "Q2 2026 refund policy: full refund within 30 days of purchase...", "category": "policy"},
    {"title": "API Rate Limits", "content": "Free tier: 100 req/min, Paid tier: 1000 req/min...", "category": "tech"},
]
count = writer.write_documents(docs)
print(f"Wrote {count} documents")

Best for: Knowledge base bulk import, incremental document embedding ingestion.

Pattern 3: Semantic Retrieval with Scalar Filtering Hybrid Query

TiDB's core advantage — completing semantic retrieval and business condition filtering in a single SQL statement.

from dataclasses import dataclass


@dataclass
class SearchResult:
    id: int
    title: str
    content: str
    category: str
    similarity: float


class HybridRetriever:
    def __init__(self, engine, api_key: str,
                 model: str = "text-embedding-3-small"):
        self.engine = engine
        self.client = OpenAI(api_key=api_key)
        self.model = model

    def search(self, query: str, category: str | None = None,
               date_after: str | None = None,
               top_k: int = 5) -> list[SearchResult]:
        query_emb = self.client.embeddings.create(
            input=query, model=self.model
        ).data[0].embedding

        emb_str = str(query_emb)
        sql = """
            SELECT id, title, content, category,
                   1 - vec_cosine_distance(embedding, %s) AS similarity
            FROM knowledge_docs
            WHERE 1=1
        """
        params: list = [emb_str]

        if category:
            sql += " AND category = %s"
            params.append(category)
        if date_after:
            sql += " AND created_at > %s"
            params.append(date_after)

        sql += " ORDER BY similarity DESC LIMIT %s"
        params.append(top_k)

        with self.engine.connect() as conn:
            rows = conn.execute(sql, params).fetchall()

        return [
            SearchResult(
                id=r[0], title=r[1], content=r[2],
                category=r[3], similarity=round(r[4], 4)
            )
            for r in rows
        ]


retriever = HybridRetriever(engine, api_key="your-api-key")
results = retriever.search(
    query="refund policy", category="policy", date_after="2026-01-01", top_k=3
)
for r in results:
    print(f"[{r.similarity:.3f}] {r.title} ({r.category})")

Best for: Semantic search with business conditions, multi-dimensional filtered retrieval.

Pattern 4: Full-Text Search & Vector Retrieval Fusion

Dual-recall with keyword exact matching + semantic understanding, fused with RRF algorithm.

from dataclasses import dataclass


@dataclass
class FusedResult:
    id: int
    title: str
    content: str
    rrf_score: float
    vector_rank: int
    fulltext_rank: int


class FusedRetriever:
    def __init__(self, engine, api_key: str, k: int = 60):
        self.engine = engine
        self.client = OpenAI(api_key=api_key)
        self.k = k

    def fused_search(self, query: str, top_k: int = 5) -> list[FusedResult]:
        query_emb = self.client.embeddings.create(
            input=query, model="text-embedding-3-small"
        ).data[0].embedding
        emb_str = str(query_emb)

        vec_sql = """
            SELECT id, title, content,
                   ROW_NUMBER() OVER (ORDER BY vec_cosine_distance(embedding, %s)) AS vec_rank
            FROM knowledge_docs
            ORDER BY vec_cosine_distance(embedding, %s)
            LIMIT 50
        """
        ft_sql = """
            SELECT id, title, content,
                   ROW_NUMBER() OVER (ORDER BY MATCH(content) AGAINST(%s IN NATURAL LANGUAGE MODE)) AS ft_rank
            FROM knowledge_docs
            WHERE MATCH(content) AGAINST(%s IN NATURAL LANGUAGE MODE)
            LIMIT 50
        """

        with self.engine.connect() as conn:
            vec_rows = conn.execute(vec_sql, [emb_str, emb_str]).fetchall()
            ft_rows = conn.execute(ft_sql, [query, query]).fetchall()

        rrf_scores: dict[int, dict] = {}
        for row in vec_rows:
            doc_id = row[0]
            rrf_scores[doc_id] = {
                "title": row[1], "content": row[2],
                "vec_rank": row[3], "ft_rank": 9999,
                "rrf": 1.0 / (self.k + row[3]),
            }
        for row in ft_rows:
            doc_id = row[0]
            if doc_id in rrf_scores:
                rrf_scores[doc_id]["ft_rank"] = row[3]
                rrf_scores[doc_id]["rrf"] += 1.0 / (self.k + row[3])
            else:
                rrf_scores[doc_id] = {
                    "title": row[1], "content": row[2],
                    "vec_rank": 9999, "ft_rank": row[3],
                    "rrf": 1.0 / (self.k + row[3]),
                }

        sorted_results = sorted(
            rrf_scores.items(), key=lambda x: x[1]["rrf"], reverse=True
        )[:top_k]

        return [
            FusedResult(
                id=doc_id, title=v["title"], content=v["content"],
                rrf_score=round(v["rrf"], 5),
                vector_rank=v["vec_rank"], fulltext_rank=v["ft_rank"],
            )
            for doc_id, v in sorted_results
        ]


fused = FusedRetriever(engine, api_key="your-api-key")
results = fused.fused_search("refund policy 2026", top_k=5)
for r in results:
    print(f"[RRF:{r.rrf_score:.4f} V:{r.vector_rank} F:{r.fulltext_rank}] {r.title}")

Best for: Keyword + semantic dual-recall, exact matching for technical terms.

Pattern 5: End-to-End RAG Application Construction

Chain retrieval, reranking, and generation into a complete RAG pipeline.

from dataclasses import dataclass


@dataclass
class RAGConfig:
    retrieval_top_k: int = 10
    rerank_top_n: int = 3
    max_context_tokens: int = 3000
    llm_model: str = "gpt-4o-mini"
    embedding_model: str = "text-embedding-3-small"


class TiDBRAGPipeline:
    def __init__(self, engine, api_key: str, config: RAGConfig | None = None):
        self.config = config or RAGConfig()
        self.engine = engine
        self.client = OpenAI(api_key=api_key)
        self.retriever = HybridRetriever(
            engine, api_key, model=self.config.embedding_model
        )

    def _rerank(self, query: str, results: list[SearchResult]) -> list[SearchResult]:
        scored = []
        for r in results:
            overlap = sum(
                1 for w in query.split() if w.lower() in r.content.lower()
            ) / max(len(query.split()), 1)
            final_score = 0.7 * r.similarity + 0.3 * overlap
            scored.append((r, final_score))
        scored.sort(key=lambda x: x[1], reverse=True)
        return [r for r, _ in scored[: self.config.rerank_top_n]]

    def query(self, question: str, category: str | None = None) -> str:
        results = self.retriever.search(
            query=question, category=category,
            top_k=self.config.retrieval_top_k
        )
        reranked = self._rerank(question, results)
        context = "\n\n".join(
            f"[{r.title}] ({r.category})\n{r.content}" for r in reranked
        )
        if len(context) > self.config.max_context_tokens * 4:
            context = context[: self.config.max_context_tokens * 4]

        prompt = (
            "Answer the question based on the following retrieval results. "
            "If the context lacks sufficient information, state it clearly.\n\n"
            f"Context:\n{context}\n\nQuestion: {question}"
        )
        response = self.client.chat.completions.create(
            model=self.config.llm_model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=800,
        )
        return response.choices[0].message.content


pipeline = TiDBRAGPipeline(engine, api_key="your-api-key")
answer = pipeline.query("What is the Q2 2026 refund policy?", category="policy")
print(answer)

Best for: Production-grade RAG Q&A systems, enterprise knowledge base assistants.

Pitfall Guide: 5 Common Traps

Trap 1: Querying vector columns without an index

❌ Wrong:

SELECT * FROM knowledge_docs
ORDER BY vec_cosine_distance(embedding, @query_emb)
LIMIT 10;

✅ Right:

ALTER TABLE knowledge_docs
ADD VECTOR INDEX idx_vec USING HNSW (embedding)
WITH (ef_construction = 128, m = 16);

-- Set ef_search at query time
SET SESSION tidb_vector_ef_search = 64;
SELECT * FROM knowledge_docs
ORDER BY vec_cosine_distance(embedding, @query_emb)
LIMIT 10;

Trap 2: Embedding dimension mismatch with index dimension

❌ Wrong:

embedding = Column(VectorType(768))  # Table defined as 768-dim
# But writing 1536-dim vectors from text-embedding-3-small

✅ Right:

EMBEDDING_DIM = 1536

class KnowledgeDoc(Base):
    embedding = Column(VectorType(EMBEDDING_DIM))

# Explicitly specify dimensions when generating embeddings
response = client.embeddings.create(
    input=text, model="text-embedding-3-small", dimensions=EMBEDDING_DIM
)

Trap 3: Not disabling index during bulk writes

❌ Wrong:

for doc in large_doc_list:
    session.add(KnowledgeDoc(embedding=doc["emb"]))
session.commit()  # Each write triggers index update

✅ Right:

# Disable vector index first, rebuild after bulk load
with engine.connect() as conn:
    conn.execute("ALTER TABLE knowledge_docs ALTER INDEX idx_vec INVISIBLE")
    conn.commit()

bulk_write(large_doc_list)

with engine.connect() as conn:
    conn.execute("ALTER TABLE knowledge_docs ALTER INDEX idx_vec VISIBLE")
    conn.commit()

Trap 4: Applying scalar filtering after vector ranking

❌ Wrong:

SELECT * FROM knowledge_docs
ORDER BY vec_cosine_distance(embedding, @emb)
LIMIT 100;
-- Then filter category in application code

✅ Right:

SELECT * FROM knowledge_docs
WHERE category = 'policy' AND created_at > '2026-01-01'
ORDER BY vec_cosine_distance(embedding, @emb)
LIMIT 10;

Trap 5: Feeding untruncated RAG context directly to LLM

❌ Wrong:

context = "\n".join(r.content for r in all_results)
prompt = f"Context: {context}\nQuestion: {question}"

✅ Right:

MAX_CTX_TOKENS = 3000

def truncate_context(results: list[SearchResult]) -> str:
    parts, total = [], 0
    for r in results:
        est_tokens = len(r.content) // 4
        if total + est_tokens > MAX_CTX_TOKENS:
            break
        parts.append(f"[{r.title}]\n{r.content}")
        total += est_tokens
    return "\n\n".join(parts)

Error Troubleshooting: 10 Common Errors

#	Error Message	Cause	Solution
1	`Vector dimension mismatch: expected 1536 got 768`	Embedding dimension doesn't match table definition	Unify embedding model dimensions or modify VectorType parameter
2	`HNSW index build OOM`	ef_construction too large or data exceeds memory	Lower ef_construction to 64, build index in batches
3	`vec_cosine_distance function not found`	TiDB version below 8.0 or vector feature not enabled	Upgrade TiDB to 8.0+, confirm vector plugin is loaded
4	`FULLTEXT index not found on column content`	Full-text index not created	`ALTER TABLE knowledge_docs ADD FULLTEXT INDEX ft_content(content)`
5	`Query timeout during vector search`	HNSW ef_search too large or data exceeds expectations	Lower ef_search, add scalar filtering to narrow scope
6	`Duplicate entry for key 'PRIMARY'`	ID conflict during bulk writes	Use auto_increment or specify ID ranges explicitly
7	`Embedding API rate limit exceeded`	Embedding API call frequency exceeds limit	Add request intervals, use batch API, implement backoff retry
8	`TiDB connection pool exhausted`	Too many concurrent queries, connections not released	Configure pool size, use with statements to ensure connection return
9	`Index invisible after bulk load`	Forgot to restore index visibility after bulk write	Execute `ALTER INDEX idx_vec VISIBLE` and verify
10	`RRF fusion returns empty results`	Both vector and full-text retrieval return no matches	Relax similarity threshold, add fallback retrieval strategy

Advanced Optimization: 4 Key Techniques

1. HNSW Parameter Auto-Tuning

class HNSWAutoTuner:
    def __init__(self, engine):
        self.engine = engine

    def recommend_params(self, row_count: int,
                         recall_target: float = 0.95) -> dict:
        if row_count < 100_000:
            return {"ef_construction": 64, "m": 8, "ef_search": 40}
        elif row_count < 1_000_000:
            return {"ef_construction": 128, "m": 16, "ef_search": 64}
        else:
            return {"ef_construction": 256, "m": 24, "ef_search": 128}

    def apply(self, table: str, params: dict):
        with self.engine.connect() as conn:
            conn.execute(
                f"ALTER TABLE {table} ADD VECTOR INDEX idx_vec "
                f"USING HNSW (embedding) WITH "
                f"(ef_construction={params['ef_construction']}, m={params['m']})"
            )
            conn.execute(
                f"SET SESSION tidb_vector_ef_search = {params['ef_search']}"
            )
            conn.commit()

2. Embedding Cache to Avoid Redundant Computation

import hashlib
import json


class EmbeddingCache:
    def __init__(self, engine, ttl_hours: int = 24):
        self.engine = engine
        self.ttl = ttl_hours
        self._local: dict[str, list[float]] = {}

    def _cache_key(self, text: str, model: str) -> str:
        raw = f"{model}:{text}"
        return hashlib.md5(raw.encode()).hexdigest()

    def get_or_compute(self, text: str, model: str,
                       compute_fn) -> list[float]:
        key = self._cache_key(text, model)
        if key in self._local:
            return self._local[key]
        emb = compute_fn(text, model)
        self._local[key] = emb
        return emb

3. Query Router: Auto-Select Retrieval Strategy

class QueryRouter:
    def route(self, query: str) -> str:
        has_exact_terms = any(
            kw in query.lower() for kw in ["id", "code", "version", "number"]
        )
        is_recent = any(
            kw in query.lower() for kw in ["latest", "recent", "today", "this week"]
        )
        if has_exact_terms and is_recent:
            return "fused"
        elif has_exact_terms:
            return "fulltext"
        elif is_recent:
            return "hybrid"
        else:
            return "vector"

4. Vector Search Performance Monitoring

class VectorSearchMonitor:
    def __init__(self, engine):
        self.engine = engine

    def get_index_stats(self, table: str) -> dict:
        with self.engine.connect() as conn:
            count = conn.execute(
                f"SELECT COUNT(*) FROM {table}"
            ).scalar()
            null_embs = conn.execute(
                f"SELECT COUNT(*) FROM {table} WHERE embedding IS NULL"
            ).scalar()
        return {
            "total_rows": count,
            "null_embeddings": null_embs,
            "coverage": round((count - null_embs) / count, 4) if count else 0,
        }

Comparison: 4 Vector Retrieval Solutions

Dimension	TiDB Vector	Pinecone	Milvus	pgvector
SQL+Vector Fusion	Native	Not supported	Scalar filtering	Native
ACID Transactions	Full ACID	None	Limited	Full ACID
HTAP Analytics	Row+Column storage	None	None	None
Horizontal Scaling	Auto-scaling	Auto	Manual sharding	Requires Citus
Index Types	HNSW	Auto	IVF/HNSW/DiskANN	HNSW/IVFFlat
Max Dimensions	16384	2048	32768	2000 (HNSW)
Full-Text Search	Native FULLTEXT	None	None	tsvector
Deployment	Self-hosted/Serverless	Cloud only	Self-hosted/Cloud	Self-hosted
Ops Complexity	Low (managed)	Very low	High	Medium
Best For	RAG+Business fusion	Pure vector SaaS	Large-scale vectors	PG ecosystem

TiDB Vector's core advantage: one database handles SQL + vector + full-text + transactions — no data sync pipelines needed.

Summary & Outlook

TiDB Vector Search is becoming a key infrastructure for RAG production deployment in 2026:

Serverless Vectors: TiDB Cloud Serverless bills by vector query volume — zero-ops RAG launch
Multimodal Vectors: Image and audio embeddings stored alongside text, cross-modal retrieval
Streaming Embedding Updates: CDC-driven automatic embedding recomputation, data changes reflected in vector index in real-time
Adaptive Indexing: Auto-adjusts HNSW parameters based on query patterns, no manual tuning
RAG Evaluation Standardization: Built-in retrieval quality metrics in TiDB, end-to-end RAG evaluation out of the box

Choosing TiDB vector RAG: if your RAG needs SQL filtering, transaction guarantees, or business data fusion, TiDB is the only one-stop solution. For pure vector retrieval, consider Pinecone; for ultra-large-scale vector stores, consider Milvus. Start with TiDB Cloud Serverless free tier to validate.

Recommended Online Tools

JSON Formatter — Format embedding vectors and retrieval result JSON data
Hash Calculator — Compute document fingerprint and embedding cache MD5/SHA hashes
Curl to Code — Convert TiDB API and embedding interface debug curls to Python code