Tuesday, 8 April, 2025
Meta's Use of Custom Llama 4 Variant Raises AI Benchmark Integrity Concerns

Meta's recent deployment of an "experimental chat version" of its Llama 4 Maverick model on the LMArena benchmark platform has sparked controversy. This tailored variant, optimized for conversational tasks, achieved a high ELO score of 1417, surpassing OpenAI's GPT-4o. However, since this specific version isn't publicly available, concerns have arisen regarding the transparency and fairness of such benchmark practices.