Vector Search & Embeddings

Last updated: March 24, 2026

How semantic search finds the right chunks

Traditional keyword search finds exact word matches. Vector search finds chunks that mean the same thing as your question, even if they use completely different words. That is the core of why I can answer "what happens if I run out of cards" when the rulebook says "when the draw pile is exhausted".

Loading diagram...

The HNSW index is what makes this fast. Without it, a full-table cosine similarity scan over millions of 768-dimensional vectors would take seconds. With HNSW, it takes 10-50 milliseconds.

HNSW: Hierarchical Navigable Small World

HNSW is an approximate nearest-neighbor algorithm. It builds a multi-layer graph where each node is a vector. At query time it navigates from a random entry point, following edges to progressively closer neighbors. The "approximate" part means it might miss the absolute nearest neighbor 1-2% of the time in exchange for being 100x faster than exact search.

For rules questions, this tradeoff is excellent. Missing the single most relevant chunk by 1% does not meaningfully affect answer quality.

Why 768 dimensions?

jina-v2-small-en produces 768-dimensional vectors. The choice reflects:

  • Semantic resolution: 768 dimensions capture fine-grained meaning differences that 256- or 384-dim models miss.
  • Index size: 768-dim vectors per chunk are ~6KB each. Manageable at the scale of millions of chunks.
  • Speed: Still fast enough for sub-100ms HNSW queries.

Embedding models

All embeddings run through the embedding-api service (port 3462), a Python/Flask wrapper around the sentence-transformers library. This runs locally -- no external API call needed at query time.

Performance characteristics

OperationTypical latency
Embed a question (100 tokens)80-150ms
HNSW search (top-20 candidates)10-50ms
Filter by game_id + threshold<5ms
Total vector retrieval~100-200ms

The vector database

I store embeddings in the chunk_embeddings table in PostgreSQL using the pgvector extension. The main vector column (main_vector) uses an HNSW index. Forum thread embeddings and PDF chunk embeddings share the same table but are distinguished by chunk_type.