Artificial General Intelligence (AGI): How far are we from achieving it?

A while ago OpenAI released new models like o1, o3, 03-mini and GPT-4o that have the capacity to reason, think and ponder on its responses to prompts as you use it, in multiple use cases.

Quite a number of times, OpenAI have been open about the fact that they are working towards building an AGI (Artificial General Intelligence) and these latest models are a foundation of that.

But first, what is Artificial General Intelligence?

Artificial General Intelligence (AGI) defined

According to IBM, Artificial General Intelligence (AGI) is a hypothetical stage in the development of machine learning (ML) in which an artificial intelligence (AI) system can match or exceed the cognitive abilities of human beings across any task. In simpler terms, it is an AI that is able to think, process information and reason like a human being.

Artificial General Intelligence is still theoretical at the moment, but I do believe that we may reach a point where AI becomes and functions like a human being, and that it will have a huge impact to the functions of technology, the way we execute work and build complex systems.

While OpenAI, Google and DeepSeek have models that have reasoning capabilities, able to follow a thought process when giving out responses to prompts, we need benchmarks and tests to determine how far we are from achieving AGI. In comes the ARC-AGI test.

ARC-AGI: The test of true Artificial General Intelligence

ARC-AGI, which stands for Abstract and Reasoning Corpus for Artificial General Intelligence is a benchmark that is used to measure intelligence. The ARC-AGI tests consist of puzzle-like problems where an AI has to identify visual patterns from a collection of different-colored squares and generate the correct “answer” grid. Essentially, the problems are designed to force an AI to adapt to new problems it hasn’t seen before. There two benchmark tests currently are:

ARC-AGI-1:

This benchmark test was developed by François Chollet in his 2019 paper On The Measure of Intelligence, specifically as a novel benchmark designed to test machine reasoning and general problem-solving skills. This test consists of 800 puzzle-like tasks, designed as grid-based visual reasoning problems. These may seem trivial for a human being but are deemed as challenging to AI.

ARC-AGI-2:

ARC-AGI 2, which was released in March of 2024, is a benchmark test that represents a compass pointing towards useful research direction, a playground to test few-shot reasoning architectures, a tool to accelerate progress towards AGI.

ARC-AGI-2 tests and studies the AI’s ability to identify and answer:

The Symbolic Interpretation in patterns: Symbolic Interpretation is the process of assigning meaning and significance to symbols, objects, or actions.
Compositional Reasoning: Is the process of grasping the significance of attributes, relations, and word order.
Contextual Rule Application: Is the context in which AI refers to the ability of a system to dynamically apply rules or logic based on the specific context or situation it is operating in, rather than follow fixed or static rules.

How far are we from achieving Artificial General Intelligence (AGI)?

When OpenAI launched the o1, o3 and 03-mini models, we viewed it as a next level step in the “intelligence” part of Artificial Intelligence and to illustrate this, OpenAI featured ARC-AGI-1 in December of 2024 as the leading benchmark to measure the performance of their experimental o3 model. o3 at low compute had scored 75.7% on ARC-AGI-1 and reached 87% accuracy with higher compute. This marked the first effective solution of the ARC challenge in over five years.

However, this achievement was short lived when the newer models were tested on the latest and much more advanced ARC-AGI-2. In a recent report, it was stated that reasoning models like OpenAI’s o1 and DeepSeek’s R1 had scored between 1% and 1.3% in the test, while non-reasoning models, like GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Flash, scored around 1%.

To establish a human baseline for this benchmark test, they had 400 humans take the ARC-AGI-2 test. On average, the humans who took the test got 60% of the test’s questions right, which was much better than any of the models’ scores.

These results are actually telling us that we are still a bit too far out from achieving ultimate Artificial General Intelligence – at least by today’s standards, technology and benchmarks. I actually believe that we may not even see AGI in our lifetime, maybe our children or their children may see it achieved if the advancements in AI progress at the rate they are now.

But what would it mean for the world if we eventually achieved AGI?

What achieving AGI would mean for the world?

For those who have watched movies like Atlas, Terminator, Echelon Conspiracy or any other AI related films which depict just how dangerous a self-aware AI can be, may have seen that in a case where it is not moderated or controlled, an AGI may just become too self aware to a point where the humanoid robots that are being developed by the likes of 1X Technologies, Clone Robotics, Tesla and Boston Dynamics come alive and take over all of humanity (this is bordering on science fiction yes, but we’ve all seen Terminator, right?). But that’s potentially what may happen should AGI because too self aware.

But on the flipside, we may also be in a situation where AGI helps us solve a lot of the human mysteries, develop illusive cures to diseases at a fraction of a second and quadruples productivity in factories, businesses and academia. It may even help solve computing problems and actually perfect quantum computing therefore challenging conventional thinking because it would be able to predict and pre-empt any changes on the fly at rates that no human would.

But this is a future still being explored and may not even be a reality with the current AI models we have and like I mentioned, may not even be a reality during our lifetime.