Tuesday, 30 September, 2025
DeepSeek Launches Sparse Attention Model, Halves API Costs

DeepSeek has unveiled V3.2-exp, a sparse-attention model using a “lightning indexer” and fine-grained token selection to trim inference expenses. In long-context applications, this architecture can cut per-call API costs by up to 50%. The model is open-weight and publicly available on Hugging Face, enabling further third-party validation and adoption.
Read full story at TechCrunch