OpenAI has just released its o3 series of AI models that focus on reasoning capabilities. So far, they have demonstrated excellent performance in internal benchmarks and raised questions about the potential for such intelligence. Even though scores are impressive, one needs to see whether the o3 model reaches human-level intelligence or if it is merely an improved version of earlier models.
The o3 model scored 85 percent on the ARC-AGI benchmark, a test that measures reasoning and spatial understanding. This score is 30 percent better than the previous best score, and the result matches the average score of humans on the same test. This achievement may suggest that o3 is close to human-level reasoning, but the reality may be more complex.
What Does This Benchmark Score Really Mean?
Despite the stellar score, however, we do not immediately believe that the capability of the model o3 was equivalent to humans. The scoring is not detailed enough to judge the capabilities and capabilities of AI. Open AI has not given detailed information about what architecture, methodology, or databases were used during training for creating o3 and therefore cannot pinpoint whether it can be called really intelligent.
The o3 model is the continuation of the o-series. It is an improvement on its architecture rather than a complete overhaul. For example, the o1 series introduced the method of test-time compute, allowing the model to use extra time for processing solutions and testing theories. GPT-4o was also a fine-tuned version of GPT-4.
Considering the fact that OpenAI is reported to be developing GPT-5, the o3 model does not seem to be a new architectural design. Rather, the company has probably been optimizing the model for better performance in reasoning tasks.
The ARC-AGI benchmark is comprised of grid-based pattern recognition questions that require reasoning and spatial reasoning. Good performance on such a test would mean an AI model having an excellent amount of robust dataset focused on logic and reasoning. But the test is not that simple-it has been thought so before. If that were possible, the score of the previous best would have been much higher. The previous best scored at 55 percent, which indicates that o3 left the others way behind with its well-refined techniques and algorithms.
Is o3 Really Near AGI?
While an 85 percent score for o3 is quite impressive, it does not imply that the model has reached AGI. Developing AGI would be a far greater breakthrough than this and would have a very significant impact on the partnership of OpenAI with Microsoft. If OpenAI had indeed achieved AGI, the company would probably have made a much more public announcement.
AI experts, including Geoffrey Hinton, have also emphasized that AGI remains years away. If the o3 model were close to AGI, OpenAI would likely be more transparent about this milestone.
The o3 model is a step forward in AI development, but the pursuit of true AGI is still a long way off.
ALSO READ | Apple Agrees to Pay $95 Million to Settle Siri Eavesdropping Lawsuit