Building Board Game Librarian: the story so far
Last updated: March 23, 2026
Building Board Game Librarian: the story so far
These are working notes from someone who's been close to the project — not a press release.
Where it started
The original idea was small: a Telegram bot that could answer rules questions for board games. Late 2025. One service, a PostgreSQL database, a handful of game rulebooks stored as PDFs.
The problem it was trying to solve is real. You're mid-game, something ambiguous comes up, and you either interrupt everything to search through a 200-page rulebook, or you make a ruling that might be wrong and move on. Neither is satisfying. The bet was that if you could ask a question in natural language and get a specific, cited answer back in under ten seconds, that would actually be useful.
So we built the Telegram bot. It would find the relevant section in the rulebook and answer. Straightforward in concept. The implementation — vectorising thousands of pages of rulebook text, handling disambiguation, producing answers that cited specific page ranges — was considerably less so.
The first real question
The first real test of whether it worked came quickly. Someone asked about a rules interaction in a complex game, the kind of edge case buried in an FAQ addendum that most players don't know exists. The bot found it. Page number and everything.
That confirmed the core was working. It also told us what we'd need to keep doing: adding rulebooks. Without content, the system is useless. We built pipelines to extract text from PDFs using Apache Tika, generate embeddings with sentence-transformers, and store everything in pgvector for similarity search.
One architectural decision from this phase still shapes everything: PDF chunking is on-demand. When a rulebook is imported, the text is extracted immediately. But the actual chunking and embedding generation doesn't happen until someone asks a question about that game. There are now over 3,300 rulebooks in the system, many in a pending state — and that's correct, not a backlog to clear. The compute runs when users actually want answers, not speculatively.
Adding a web interface
Telegram worked, but it had limits. Not everyone wants to use it, and some answers — long, structured, with citations — read better in a web UI than in a chat message.
In January 2026 we launched the web chat interface. This forced us to think harder about the response pipeline. Telegram messages have length limits and plain text formatting. A web interface can show formatted markdown, clickable citation links, expandable source sections. Same underlying answer, different presentation.
The Q&A architecture got more structured too. We introduced a two-tier system: Tier 1 is a fast response (5–12 seconds) using vector search and AI synthesis. Tier 2 is a slower, deeper analysis (up to 35 seconds) that also pulls from community forum discussions — useful for edge cases and unofficial FAQs that aren't in the rulebook. The two tiers aren't separate systems; they're stages in the same pipeline, with Tier 2 kicking in when the question complexity warrants it.
We also added OpenTelemetry tracing during this period. Not glamorous work, but when something's slow or wrong, you need to know where in the pipeline it happened.
Choosing how to search
February 2026 was when the project changed shape. Up to that point, it was a Q&A tool with one interface (Telegram) and one secondary interface (web chat). The question became: could this be useful to publishers themselves, not just players?
Yes — but it required a different delivery mechanism. A publisher adding a rules assistant to their game's product page can't tell their visitors to use a separate app. The assistant needs to come to the publisher's site, not the other way around.
So we built an embeddable widget. This meant solving several problems at once: session management (each visitor on a publisher's site is a different user), API key authentication (publishers authenticate with us, not their visitors), domain restrictions to prevent key theft, and UI flexibility because different sites have different visual languages.
The result was an iframe-based embed system with a JavaScript initialisation path for publishers who want more control. The same widget.js that serves a fan site and a publisher selling their latest game — the same code, configured differently via parameters.
Building for publishers: the widget
Getting the widget right took several iterations.
The transparent background mode came from publisher feedback. When you embed something with a white or dark background into a site with a custom design, it looks like a foreign object dropped in. transparent=1 removes the widget container's background so the publisher's page shows through. Small feature, but it matters for pages where visual design has been carefully considered.
The gameId parameter solved a different problem. If you're running a dedicated product page for one game, you don't want visitors selecting a game from a dropdown — they're already on that game's page. Pre-selecting it removes a step and makes the widget feel like it belongs there.
Language detection turned out to be simpler than expected, but the usage data was surprising. The widget auto-detects the language of each question and responds in that language. We'd expected most users to type in English. A notable portion type in their home language, even on English-language publisher sites. The system handles 10 languages now: English, Italian, German, French, Spanish, Portuguese, Russian, Japanese, Polish, and Chinese. We didn't plan all ten from day one — Russian, Japanese, Polish, and Chinese were added in March 2026 because usage data showed they were needed.
The partner portal
The widget needed somewhere to configure it. That became the Partner Portal — a separate authenticated section of the web application where publishers manage their account, widgets, games, and billing.
The portal grew significantly over February and March 2026. Early versions had basic widget configuration. Later versions added team management with role-based access (managers can invite and remove team members; viewers have read-only access), audit logs tracking every meaningful action, in-app notifications for account events, billing with support for multiple legal entity types including Italian forms (S.r.l., S.p.A.) and German GmbH, and per-widget usage statistics: question counts, response time distributions, language breakdowns.
The audit log was built after a partner asked whether they could see who on their team had changed widget configuration. Reasonable question for any business account system. Now every login, configuration change, and team event is logged with timestamp, actor, and IP address.
Community knowledge
Rulebooks are the primary source, but they're not the only source of rules knowledge. Publishers issue FAQ documents. The community discusses edge cases in forum threads. Some rulings only exist in those discussions.
In February 2026 we integrated community forum data into the Tier 2 pipeline. When someone asks a question the rulebook doesn't cleanly answer, the deep analysis now also searches community threads — looking for relevant discussions, scored by how likely they are to contain a useful ruling rather than just chatter.
Globally, the system has around 584,000 threads and 3.8 million posts covering 4,457 library games. Smaller or newer games may have very little community data. One thing to be clear about: community data supplements the rulebook, it doesn't replace it. When an answer draws from a community thread, that's shown explicitly, and the user can see which thread it came from.
Ten languages
The original language support was English, Italian, German, French, Spanish, and Portuguese. Six languages covered the core European markets.
Adding Russian was the first expansion. Then Japanese. Then Polish and Chinese. Each addition meant updating the shared commands module, updating all AI prompt templates with stricter rules about not mixing languages in responses, and testing that the language detection actually works reliably for the new language.
The language detection issue we spent the most time on: early versions of the extraction pipeline would sometimes detect the language of the game's name rather than the question text. A question like "How does [Japanese game title] work?" would be tagged as Japanese even when asked in English. The fix sounds obvious in retrospect — weigh the question text more heavily, ignore proper nouns for language detection — but finding the root cause took a while.
What we've learned
A few things stand out.
On-demand processing beats pre-processing. Only chunking and embedding PDFs when someone actually asks about a game means we're not spending compute on 3,300 rulebooks that nobody queries. The trade-off is that the first question about a game is slower (cold start). We think we made the right call, but the cost is real.
Citations change how people trust the answers. Early versions gave answers without page references. Users weren't sure whether to trust them. Adding specific page ranges made a visible difference in how answers were received — people could verify. The citation extraction work was tedious but worth it.
Infrastructure correctness matters as much as AI quality. Some of the most impactful fixes had nothing to do with AI prompts or embeddings. A bug where community forum threads were invisible to the search system because of a wrong join condition. A concurrency race condition in the enrichment worker that corrupted page counts. A session bug that caused Tier 2 searches to use the wrong game. These weren't ML problems — they were software problems. Getting them right made the AI quality visible in a way it wasn't before.
Building for publishers is different from building for users. Users want answers. Publishers want control, visibility, and reliability. The partner portal needed audit logs, team roles, and usage statistics not because they make the Q&A better, but because a business can't adopt a tool they can't manage.
What comes next
We don't fully know yet.
There are obvious directions. Expanding the game catalogue — 3,300 rulebooks is a lot, but it's far from complete coverage. Improving the Tier 2 community analysis, particularly for games where the community has produced detailed unofficial FAQs that haven't made it into any official document. Expanding language support further as usage data points to demand.
The less obvious work is in the details. PDF page citations accurate to the exact page, not a range. Better handling of games with multiple editions and different rules. Smarter handling of "it depends" questions — the ones where the correct answer depends on which variant or optional rule you're using.
We'll keep publishing these notes as things develop. If you're a publisher thinking about integration, or a player with feedback about an answer you received, reach out. The system improves most from real usage — questions we didn't anticipate, answers that were wrong, games that aren't covered yet.