How semantic search works: finding meaning, not keywords
Last updated: March 23, 2026
How semantic search works: finding meaning, not keywords
Why keyword search fails rulebooks
You're mid-game and you need to know what happens when two units occupy the same hex. You type "same hex two units" into a search box. A keyword engine looks for those exact words. If the rulebook says "stacking rules" or "zone of control" or "friendly unit limit," you get nothing.
That's the core problem with keyword search on rulebooks. Rules are written in the author's vocabulary, not yours. Combat might be called "conflict resolution." Moving might be called "displacement." Edge cases get buried in paragraphs titled something completely different from what you'd naturally search for.
Board Game Librarian doesn't search for your words. It searches for your meaning.
What an embedding is
An embedding is a list of numbers — specifically, 768 numbers — that represents the meaning of a piece of text. Not the words themselves. The meaning.
Text that means similar things gets similar numbers. "How do I attack?" and "combat resolution procedure" end up as two lists of numbers that are close together in mathematical space. "Cookie recipes" ends up very far away from both.
This isn't magic. It's a consequence of how the embedding model was trained: on enormous amounts of text, learning which concepts appear near each other, which contexts co-occur, which phrases are used interchangeably. The model encodes semantic relationships into geometry.
A 768-dimensional space is impossible to visualise, but the principle is the same as a 2D map. Points that are close to each other are semantically related. The search problem becomes: given my question, which rulebook chunks are closest in this space?
From words to numbers: the embedding process
Every piece of text in the system passes through the same model: jina-v2-small-en, running locally on a Python/Flask service. "Locally" matters — there's no external API call, no rate limit, no latency added by a third-party service.
When a rulebook is imported, the text gets split into chunks (a few hundred words each, with overlap to avoid cutting off context). Each chunk gets embedded. Those 768-number vectors are stored in PostgreSQL alongside the chunk text, using the pgvector extension.
When you ask a question, the same model embeds your question in real time. Now you have two things: a 768-number vector for your question, and thousands of 768-number vectors for every rulebook chunk. The search problem is finding the chunks whose vectors are closest to the question vector.
Closeness is measured by cosine similarity — the angle between two vectors. Score 1.0 means identical direction (perfect match). Score near 0 means completely unrelated. In practice, good matches score around 0.7–0.85. The threshold for "relevant" sits around 0.5.
Three searches per question
Each chunk has three stored vectors, not one:
- main_vector — the embedding of the raw chunk text
- concepts_vector — the embedding of key concepts extracted from the chunk
- topic_vector — the embedding of the topic label assigned to the chunk
A question can match a chunk in different ways. "How does flanking work?" might match the main text because it contains "flanking bonus" explicitly. It might match the concepts vector because "movement advantage" is a related concept. It might match the topic vector because the chunk is categorised under "combat mechanics."
Running three parallel searches and merging the results improves recall — you're more likely to find the right chunk even if one search modality misses it. The three result sets are merged and de-duplicated before re-ranking.
The latency cost is marginal. Most searches complete in under 100ms.
Query expansion
Before embedding your question, there's an optional step: query expansion via the corpus-analyzer service, which runs BERTopic topic modelling.
Your question might be short — "can I move after attacking?" That's five words. Expanding it to include related terms ("movement restrictions," "action sequence," "post-attack movement rules") gives the embedding model more signal.
Query expansion is applied selectively, not universally. Short questions with clear topic signals benefit most. Long, already-detailed questions don't need it.
One thing that sometimes confuses people: query expansion happens before embedding, so the expanded query still produces a single vector. It's not splitting the search into multiple queries — it's making one query richer.
How results are ranked and selected
After the three searches return results and they're merged, you have a pool of candidate chunks, each with a cosine similarity score. Sorting by score is only the start.
A few factors adjust the final ranking:
- Chunks from the same source PDF as the top result get a small boost (if the top chunk is relevant, neighbouring sections probably are too)
- Duplicate or near-duplicate chunks are collapsed (chunk overlap means you'd otherwise see the same paragraph twice)
- Very short chunks below a minimum word count are filtered out (headers and captions aren't substantive rule explanations)
The top 10 chunks after this process are passed to the language model. Not 100. Ten.
Sending too few risks missing context. Sending too many and the language model buries itself in noise, costs more tokens, and runs slower. Testing showed diminishing returns past 10 for most questions.
Cross-language search
This surprises people: you can ask a question in Italian and get an answer sourced from an English rulebook, with no manual translation step.
jina-v2-small-en, despite the "en" in its name, handles multiple languages. It was trained on multilingual text, so it learned cross-lingual semantic relationships. "Come funziona il combattimento?" and "how does combat work?" end up as nearby vectors even though the words are completely different.
Italian questions match English rulebook chunks. German questions match Spanish rulebook chunks. The cross-lingual property is strongest for high-resource languages — English, Italian, German, French, Spanish, Portuguese, Russian, Japanese, Polish, Chinese — but it works well enough that users don't need to think about it.
The language model then responds in your question's language. The search happens in multilingual embedding space; the answer comes back in whatever language you asked in.
The approximate nearest neighbour trade-off
Finding the closest vector in a space of millions would take a long time if you checked every vector. Comparing your question vector against every stored chunk vector gets impractical fast.
Board Game Librarian uses HNSW: Hierarchical Navigable Small World graphs. It organises vectors into a graph structure that lets you navigate quickly to the nearest neighbours without checking everything.
The "approximate" in "approximate nearest neighbour" is the trade-off. HNSW doesn't guarantee it finds the absolute closest vector. It might miss the #1 match and return #2 or #3 instead. For question-answering, this rarely matters. If two chunks score 0.81 and 0.80, they're both highly relevant. The tiny recall loss is invisible to users.
The speed gain is dramatic. Searches that might take seconds with brute force complete in milliseconds. For a system handling live questions, this isn't negotiable.
Where semantic search struggles
Semantic search isn't infallible. Knowing its weaknesses helps set expectations.
Highly specific references. If a user asks about "the third ability on the Tower card," semantic search may score the right chunk lower than you'd expect. Specific card names and narrow rule references are context-starved in embedding space — the model hasn't seen enough signal to know exactly where they sit relative to your question.
First query on a new game. The system processes PDFs on-demand. When a game's rulebook hasn't been chunked and embedded yet (because nobody has asked about it before), the first query gets a RAW-FIRST fallback: the full extracted text is used directly, without vector search. Subsequent queries use proper semantic search. This only affects the very first question for any given game.
Unusual terminology. Some games use invented vocabulary. A game that invents "floortile actions" as a mechanic name may see weaker matches simply because the embedding model has no prior context for that term. Query expansion helps here, but it's not a complete solution.
Very short questions. "Can I?" or "what about resources?" don't give the embedding model enough signal. Specific questions perform better: "can I use a resource card on an opponent's turn?"
Common questions
Does the system search across all games at once? No. When you select a game, the search is scoped to that game's rulebook chunks. This keeps results focused and prevents a chunk from one game appearing in an answer about another.
What if the rulebook doesn't contain the answer? The language model is instructed not to invent answers. If the retrieved chunks don't cover the question, the answer will say so — typically noting that the rulebook doesn't address this scenario directly, and sometimes pointing to where a related rule does exist.
Why does the answer cite specific page numbers? Page references are derived from character positions in the extracted text, mapped back to the original PDF structure. They're accurate to within 1–2 pages — not perfect, but close enough to let you verify in the physical book.
Can the search handle questions about game variants or expansions? Only if the expansion rulebook has been imported. Each PDF is a separate set of chunks. If the expansion isn't in the system, its rules aren't searchable.
Is the embedding model the same one used to write the answer? No. The embedding model (jina-v2-small-en) only produces vectors — it doesn't generate text. The language model (GPT-4o or Claude via OpenRouter) reads the retrieved chunks and writes the answer. They're separate systems with different jobs.