How semantic search works: finding meaning, not keywords

Why keyword search fails rulebooks

You're mid-game and you need to know what happens when two units occupy the same hex. You type "same hex two units" into a search box. A keyword engine looks for those exact words. If the rulebook says "stacking rules" or "zone of control" or "friendly unit limit," you get nothing.

That's the core problem with keyword search on rulebooks. Rules are written in the author's vocabulary, not yours. Combat might be called "conflict resolution." Moving might be called "displacement." Edge cases get buried in paragraphs titled something completely different from what you'd naturally search for.

Board Game Librarian doesn't search for your words. It searches for your meaning.

What an embedding is

An embedding is a list of numbers — specifically, 768 numbers — that represents the meaning of a piece of text. Not the words themselves. The meaning.

Text that means similar things gets similar numbers. "How do I attack?" and "combat resolution procedure" end up as two lists of numbers that are close together in mathematical space. "Cookie recipes" ends up very far away from both.

This isn't magic. It's a consequence of how the embedding model was trained: on enormous amounts of text, learning which concepts appear near each other, which contexts co-occur, which phrases are used interchangeably. The model encodes semantic relationships into geometry.

A 768-dimensional space is impossible to visualise, but the principle is the same as a 2D map. Points that are close to each other are semantically related. The search problem becomes: given my question, which rulebook chunks are closest in this space?

From words to numbers: the embedding process

Every piece of text in the system passes through the same model: jina-v2-small-en, running locally on a dedicated embedding service. "Locally" matters — there's no external API call, no rate limit, no latency added by a third-party service.

When a rulebook is imported, the text is split hierarchically: first into coarse sections, then into smaller passages within each section. Each passage gets embedded. The passages are the precise search targets; when the system retrieves a passage for your question, it also has access to the surrounding section for full context. Those 768-number vectors are stored in PostgreSQL alongside the passage text, using the pgvector extension.

When you ask a question, the same model embeds your question in real time. Now you have two things: a 768-number vector for your question, and thousands of 768-number vectors for every rulebook chunk. The search problem is finding the chunks whose vectors are closest to the question vector.

Closeness is measured by cosine similarity — the angle between two vectors. Score 1.0 means identical direction (perfect match). Score near 0 means completely unrelated. In practice, good matches score around 0.7–0.85. The threshold for "relevant" sits around 0.5.

Three searches per question

Each chunk has three stored vectors, not one:

main vector — the embedding of the raw chunk text
concepts vector — the embedding of key concepts extracted from the chunk
topic vector — the embedding of the topic label assigned to the chunk

A question can match a chunk in different ways. "How does flanking work?" might match the main text because it contains "flanking bonus" explicitly. It might match the concepts vector because "movement advantage" is a related concept. It might match the topic vector because the chunk is categorised under "combat mechanics."

Running three parallel searches and merging the results improves recall — you're more likely to find the right chunk even if one search modality misses it. The three result sets are merged and de-duplicated before re-ranking.

The latency cost is marginal. Most searches complete in under 100ms.

Query expansion

Before embedding your question, there's an optional step: query expansion via the corpus-analyzer service, which runs BERTopic topic modelling.

Your question might be short — "can I move after attacking?" That's five words. Expanding it to include related terms ("movement restrictions," "action sequence," "post-attack movement rules") gives the embedding model more signal.

Query expansion is applied selectively, not universally. Short questions with clear topic signals benefit most. Long, already-detailed questions don't need it.

One thing that sometimes confuses people: query expansion happens before embedding, so the expanded query still produces a single vector. It's not splitting the search into multiple queries — it's making one query richer.

Match small, read big

Retrieval targets small passages because smaller targets are more precise — a 200-word passage about a specific rule is a better match for a specific question than a 2,000-word section that mentions it in passing.

But answers are written from the full surrounding section. When a passage scores well, the language model also sees the paragraphs around it. This means a rule that depends on the sentence immediately before or after the matched text is not silently stripped from the context.

Passage size also adapts per rulebook. Content that is naturally modular — card effect lists, action glossaries, scenario setups — is split into finer passages. Dense prose that builds an argument across several paragraphs is kept in coarser passages. The system selects the granularity that retrieves best for each document.

Component lists and card tables are detected as table sections during chunking and given a separate retrieval budget. This prevents a large component table from crowding out prose rules, and it means counting questions ("how many growth cards are there?") are routed to a dedicated template that reads the table directly.

How results are ranked and selected

After the three searches return results and they're merged, you have a pool of candidate chunks, each with a cosine similarity score. The retrieval layer is hybrid: vector cosine similarity is combined with BM25 keyword matching. A per-PDF BM25-floor reservation means a strongly-matched source cannot be dropped by score alone, even if its cosine score is slightly below the cutoff.

After the hybrid merge, an LLM reranker (Claude Haiku 4.5) re-scores each candidate against the original question. This step catches passages that share vocabulary with the question but describe a different mechanic, and it lifts passages with semantically correct content that cosine similarity underscored.

A few additional factors adjust the final ranking:

Chunks from the same source PDF as the top result get a small boost (if the top chunk is relevant, neighbouring sections probably are too)
Duplicate or near-duplicate chunks are collapsed (chunk overlap means you'd otherwise see the same paragraph twice)
Very short chunks below a minimum word count are filtered out (headers and captions aren't substantive rule explanations)

The top passages after this process are passed to the language model — enough for coverage without burying it in noise.

Cross-language search

You can ask a question in Italian and get an answer sourced from an English rulebook, with no manual translation step.

jina-v2-small-en, despite the "en" in its name, handles multiple languages. It was trained on multilingual text, so it learned cross-lingual semantic relationships. "Come funziona il combattimento?" and "how does combat work?" end up as nearby vectors even though the words are completely different.

Italian questions match English rulebook chunks. German questions match Spanish rulebook chunks. The cross-lingual property is strongest for high-resource languages — English, Italian, German, French, Spanish, Portuguese, Russian, Japanese, Polish, Chinese — but it works well enough that users don't need to think about it.

The language model then responds in your question's language. The search happens in multilingual embedding space; the answer comes back in whatever language you asked in.

The approximate nearest neighbour trade-off

Finding the closest vector in a space of millions would take a long time if you checked every vector. Comparing your question vector against every stored chunk vector gets impractical fast.

Board Game Librarian uses HNSW: Hierarchical Navigable Small World graphs. It organises vectors into a graph structure that lets you navigate quickly to the nearest neighbours without checking everything.

The "approximate" in "approximate nearest neighbour" is the trade-off. HNSW doesn't guarantee it finds the absolute closest vector. It might miss the #1 match and return #2 or #3 instead. For question-answering, this rarely matters. If two chunks score 0.81 and 0.80, they're both highly relevant. The tiny recall loss is invisible to users.

The speed gain is dramatic. Searches that might take seconds with brute force complete in milliseconds. For a system handling live questions, this isn't negotiable.

Where semantic search struggles

Semantic search isn't infallible. Knowing its weaknesses helps set expectations.

Highly specific references. If a user asks about "the third ability on the Tower card," semantic search may score the right chunk lower than you'd expect. Specific card names and narrow rule references are context-starved in embedding space — the model hasn't seen enough signal to know exactly where they sit relative to your question.

First query on a new game. The system processes PDFs on-demand. When a game's rulebook hasn't been chunked and embedded yet (because nobody has asked about it before), the first query gets a RAW-FIRST fallback: the full extracted text is used directly, without vector search. Subsequent queries use proper semantic search. This only affects the very first question for any given game.

Unusual terminology. Some games use invented vocabulary. A game that invents "floortile actions" as a mechanic name may see weaker matches simply because the embedding model has no prior context for that term. Query expansion helps here, but it's not a complete solution.

Very short questions. "Can I?" or "what about resources?" don't give the embedding model enough signal. Specific questions perform better: "can I use a resource card on an opponent's turn?"

Common questions

Does the system search across all games at once? No. When you select a game, the search is scoped to that game's rulebook chunks. This keeps results focused and prevents a chunk from one game appearing in an answer about another.

What if the rulebook doesn't contain the answer? The language model is instructed not to invent answers. If the retrieved chunks don't cover the question, the answer will say so — typically noting that the rulebook doesn't address this scenario directly, and sometimes pointing to where a related rule does exist.

Why does the answer cite specific page numbers? Page references are derived from character positions in the extracted text, mapped back to the original PDF structure. They're accurate to within 1–2 pages — not perfect, but close enough to let you verify in the physical book.

Can the search handle questions about game variants or expansions? Only if the expansion rulebook has been imported. Each PDF is a separate set of chunks. If the expansion isn't in the system, its rules aren't searchable.

Is the embedding model the same one used to write the answer? No. The embedding model (jina-v2-small-en) only produces vectors — it doesn't generate text. The language model (Gemini 3 Flash via OpenRouter) reads the retrieved chunks and writes the answer. They're separate systems with different jobs.