Vector search & embeddings

Last updated: April 11, 2026

How semantic search finds the right chunks

Traditional keyword search finds exact word matches. Vector search finds chunks that mean the same thing as your question, even if they use completely different words. That is the core of why I can answer "what happens if I run out of cards" when the rulebook says "when the draw pile is exhausted".

Loading diagram...

The HNSW index is what makes this fast. Without it, a full-table cosine similarity scan over millions of vectors would take seconds. With HNSW, it takes 10–50 milliseconds.

HNSW: Hierarchical Navigable Small World

HNSW is an approximate nearest-neighbor algorithm. It builds a multi-layer graph where each node is a vector. At query time it navigates from a random entry point, following edges to progressively closer neighbors. The "approximate" part means it might miss the absolute nearest neighbor 1–2% of the time in exchange for being 100x faster than exact search.

For rules questions, this trade-off is excellent. Missing the single most relevant chunk by 1% does not meaningfully affect answer quality.

Why 768 dimensions?

jina-v2-small-en produces 768-dimensional vectors. The choice reflects:

  • Semantic resolution: 768 dimensions capture fine-grained meaning differences that smaller models miss.
  • Index size: manageable at the scale of millions of chunks.
  • Speed: still fast enough for sub-100ms queries.

Embedding service

All embeddings run through a local embedding service — a Python wrapper around the sentence-transformers library. This runs locally — no external API call needed at query time.

Performance characteristics

OperationTypical latency
Embed a question (100 tokens)80–150ms
HNSW search (top-20 candidates)10–50ms
Filter by game + threshold<5ms
Total vector retrieval~100–200ms

The vector database

Embeddings are stored in PostgreSQL using the pgvector extension. Forum thread embeddings and PDF chunk embeddings share the same storage but are distinguished by type.