A complete engineering breakdown of SmartDocs — 22 Indian languages, 11-node LangGraph agent, 18 non-negotiable laws, 10 production bugs fixed. Built from first principles for India's actual linguistic reality.
Every component chosen to handle India's linguistic reality — Devanagari Unicode, Hinglish queries, bilingual documents. Not English RAG with Hindi support bolted on.
From raw PDF to indexed vectors. Every stage handles an India-specific concern. Click any step to expand code and details.
Language detection, query expansion, hybrid retrieval, and routing decisions. Everything runs async in parallel.
Not a pipeline. A stateful agent. Conditional edges, retry cycles, graceful failures. Click any node to see its logic, state writes, and code.
Click any node in the graph above to see its purpose, state writes, and implementation logic.
"transformer kya hai" returned Norwegian. langdetect is statistically unstable on Hinglish and mixed scripts. This 7-step deterministic logic tree fixes it permanently. Try the interactive simulator.
# PRODUCTION FIXES APPLIED: # DetectorFactory.seed=0 → determinism # Script detection BEFORE langdetect # Hinglish lexicon: 130 words # ASCII dominance: 85% threshold def detect_language(text: str) -> LangResult: if not text.strip(): return default_english("empty") script = _detect_script(text) if script: return LangResult(code=script) if _detect_hinglish(text): return hi_result if _is_ascii_dominant(text): return en_result res = langdetect.detect_langs(text)[0] if res.prob >= 0.85: return res return _check_indic_fallback(text)
Audited before writing a single API route. Silent data corruption, runtime crashes, security bypasses. Ranked by damage. Click cards to see before/after diffs.
Every core architectural decision in SmartDocs traces back to peer-reviewed research. Click any paper to see how it maps to the implementation.
Every architectural decision traces to one of these laws. They encode hard-learned lessons about multilingual RAG in production. Filter by category.
Hindi faithfulness / English faithfulness > 0.97 — or it is not done. These metrics block deployment. Aggregate accuracy hiding Hindi failures is not accepted.