10 Languages, One Pipeline
Last updated: March 24, 2026
The language challenge in multilingual RAG
Answering rules questions in 10 languages is not just a translation problem. It is a pipeline problem. The rulebook is in English. The user might ask in Italian. The community forums are in English. The AI needs to answer in Italian.
Loading diagram...
The key insight: embeddings are language-agnostic enough that a question in Italian semantically matches English rulebook chunks. The language separation happens at the synthesis layer, not the retrieval layer.
Supported languages
| Code | Language | Notes |
|---|---|---|
| en | English | Primary, all features |
| it | Italian | Full support |
| de | German | Full support |
| fr | French | Full support |
| es | Spanish | Full support |
| pt | Portuguese | Full support |
| ru | Russian | Full support |
| ja | Japanese | Full support |
| pl | Polish | Full support |
| zh | Chinese | Full support |
Language detection
I use the extraction YAML (v3.0) to detect the language of the incoming question. This is a dedicated classification step that runs before the main pipeline. The key constraint: I use only content-independent signals -- grammatical patterns, function words -- not game-specific vocabulary, because game names like "Barrage" or "Wingspan" appear in all languages.
The BGG forum challenge
BGG forums are predominantly English. When a question in any non-English language escalates to Tier 2, I search BGG using the game English name and English keywords regardless of the user language. The community results are then synthesised together with the rulebook into a response in the user language.
Synthesis templates
Each of the 6 question categories has a synthesis template with a strong explicit instruction: answer in the same language as the question, never switch languages mid-answer, never output text in a different language than the question language. This instruction is present in all 20+ synthesis YAML files.