Artificial intelligence ready to score full marks on one of world’s most challenging tests

Source Domain: www.gbnews.com

Google’s Gemini model has achieved 45.9 percent on “Humanity’s Last Exam,” a significant leap from previous performances.
The test, designed to measure the divide between machine learning and human intellect, comprises 2,500 questions across roughly 100 disciplines requiring doctoral-level comprehension.
The test was collaboratively developed by Scale and the Centre for AI Safety, drawing from over 70,000 questions proposed by experts from approximately 50 countries.
The benchmark’s purpose is to evaluate both breadth and depth of knowledge and reasoning in AI systems, comparing them to the capability of universal experts.
AI models’ recent rapid advancement, noted by researchers like Calvin Zhang of Scale, has led to predictions that full marks could be achieved within twelve months.
While some models, like Google’s Gemini and Anthropic’s Claude, show improving performance, others still lag, indicating persistent gaps in AI’s understanding.
Experts like Dr. Tung Nguyen stress that the test highlights the importance of human expertise in depth, context, and specialized knowledge.
There is optimism from industry representatives, such as Kate Olszewska, that full marks could be quickly achieved if enough resources and focus are directed toward this goal.

You may have missed