Wednesday, 30 April
poster

Tuesday, 8 April2025

Meta's Use of Custom Llama 4 Variant Raises AI Benchmark Integrity Concerns

Meta's Use of Custom Llama 4 Variant Raises AI Benchmark Integrity Concerns

Meta's recent deployment of an "experimental chat version" of its Llama 4 Maverick model on the LMArena benchmark platform has sparked controversy. This tailored variant, optimized for conversational tasks, achieved a high ELO score of 1417, surpassing OpenAI's GPT-4o. However, since this specific version isn't publicly available, concerns have arisen regarding the transparency and fairness of such benchmark practices. ​

Read full story at The Verge

Subscribe To Our Newsletter.

Full Name
Email