Tuesday, 16 September, 2025
India’s “Indic Data Hunt”: Language Models Chasing Scarce Local Data

As Indian AI startups race to build Indic language models under the IndiaAI mission, they face a major hurdle: lack of diverse, high-quality data in regional languages. Early players like DeepSeek, SarvamAI, Soket Labs, Gnani.ai and BharatGPT are partnering with linguistic experts, publishers, and crowdsourcing voices. Some are licensing content or using clients’ data.
Read full story at Economic Times