Tuesday, 16 September

Tuesday, 16 September2025

India’s “Indic Data Hunt”: Language Models Chasing Scarce Local Data

India’s “Indic Data Hunt”: Language Models Chasing Scarce Local Data
As Indian AI startups race to build Indic language models under the IndiaAI mission, they face a major hurdle: lack of diverse, high-quality data in regional languages. Early players like DeepSeek, SarvamAI, Soket Labs, Gnani.ai and BharatGPT are partnering with linguistic experts, publishers, and crowdsourcing voices. Some are licensing content or using clientsdata.

Subscribe To Our Newsletter.

Full Name
Email