Researchers Create ‘Humanity’s Last Exam’ to Test the Limits of Artificial Intelligence

Source Domain: thedebrief.org

Researchers developed “Humanity’s Last Exam,” a new assessment with 2,500 questions covering diverse disciplines, to measure the capabilities of modern AI systems.
Initial results show that even advanced AI models have difficulties with the exam; for instance, GPT-4 scored 2.7% and Gemini 3.1 Pro achieved around 40-50% accuracy.
The exam aims to highlight the limitations of AI in areas requiring deep understanding, specialized knowledge, and context beyond simple pattern recognition.
The exam was meticulously designed to be too challenging for current AI systems, involving nearly 1,000 experts from multiple fields to create questions with single, verifiable answers.
The initiative seeks to provide a better understanding of AI’s strengths and weaknesses, ensuring policymakers and developers can accurately evaluate AI capabilities.
Humanity’s Last Exam represents a significant effort to establish a comprehensive benchmark for AI, although most questions remain private to maintain the exam’s integrity amid ongoing AI advancements.

You may have missed