10 languages, one pipeline

The language challenge in multilingual RAG

Answering rules questions in 10 languages is not just a translation problem. It is a pipeline problem. The rulebook is in English. The user might ask in Italian. The community forums are in English. The AI needs to answer in Italian.

Loading diagram...

The key insight: embeddings are language-agnostic enough that a question in Italian semantically matches English rulebook chunks. The language separation happens at the synthesis layer, not the retrieval layer.

Supported languages

Code	Language	Notes
en	English	Primary, all features
it	Italian	Full support
de	German	Full support
fr	French	Full support
es	Spanish	Full support
pt	Portuguese	Full support
ru	Russian	Full support
ja	Japanese	Full support
pl	Polish	Full support
zh	Chinese	Full support

Language detection

I use the extraction YAML (v3.0) to detect the language of the incoming question. This is a dedicated classification step that runs before the main pipeline. The key constraint: I use only content-independent signals -- grammatical patterns, function words -- not game-specific vocabulary, because game names like "Barrage" or "Wingspan" appear in all languages.

The community forum challenge

community forums are predominantly English. When a question in any non-English language escalates to Tier 2, I search community forums using the game English name and English keywords regardless of the user language. The community results are then synthesised together with the rulebook into a response in the user language.

Synthesis templates

Each of the 6 question categories has a synthesis template with a strong explicit instruction: answer in the same language as the question, never switch languages mid-answer, never output text in a different language than the question language. This instruction is present in all 20+ synthesis YAML files.