10 Languages, One Pipeline

Last updated: March 24, 2026

The language challenge in multilingual RAG

Answering rules questions in 10 languages is not just a translation problem. It is a pipeline problem. The rulebook is in English. The user might ask in Italian. The community forums are in English. The AI needs to answer in Italian.

Loading diagram...

The key insight: embeddings are language-agnostic enough that a question in Italian semantically matches English rulebook chunks. The language separation happens at the synthesis layer, not the retrieval layer.

Supported languages

CodeLanguageNotes
enEnglishPrimary, all features
itItalianFull support
deGermanFull support
frFrenchFull support
esSpanishFull support
ptPortugueseFull support
ruRussianFull support
jaJapaneseFull support
plPolishFull support
zhChineseFull support

Language detection

I use the extraction YAML (v3.0) to detect the language of the incoming question. This is a dedicated classification step that runs before the main pipeline. The key constraint: I use only content-independent signals -- grammatical patterns, function words -- not game-specific vocabulary, because game names like "Barrage" or "Wingspan" appear in all languages.

The BGG forum challenge

BGG forums are predominantly English. When a question in any non-English language escalates to Tier 2, I search BGG using the game English name and English keywords regardless of the user language. The community results are then synthesised together with the rulebook into a response in the user language.

Synthesis templates

Each of the 6 question categories has a synthesis template with a strong explicit instruction: answer in the same language as the question, never switch languages mid-answer, never output text in a different language than the question language. This instruction is present in all 20+ synthesis YAML files.