Gate News bot news, researchers at Apple pointed out in a paper titled "The Illusion of Thinking" published in June that leading artificial intelligence (AGI) models still face difficulties in reasoning. Therefore, the race to develop general artificial intelligence (AGI) still has a long way to go.
The article points out that the latest updates of mainstream artificial intelligence large language models (LLM) (such as OpenAI's ChatGPT and Anthropic's Claude) have included large reasoning models (LRM), but their fundamental functions, extended features, and limitations "are still not fully understood."
The current evaluation mainly focuses on established mathematical and coding benchmarks, "emphasizing the accuracy of the final answer." However, researchers indicate that this assessment does not delve into the reasoning capabilities of AI models, starkly contrasting with the expectations that general artificial intelligence could be achieved in just a few years.
Researchers designed different puzzle games to surpass standard mathematical benchmarks to test the "thinking" and "non-thinking" variants of Claude Sonnet, OpenAI's o3-mini and o1, as well as DeepSeek-R1 and V3 chatbots.
They found that "state-of-the-art logical reasoning models (LRM) face a complete collapse in accuracy when exceeding a certain level of complexity," making it impossible to generalize reasoning effectively, and their advantages diminish as complexity increases, contrary to expectations regarding the capabilities of general artificial intelligence (AGI).
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
Apple researchers: Mainstream AI models still cannot achieve the expected reasoning level of AGI.
Gate News bot news, researchers at Apple pointed out in a paper titled "The Illusion of Thinking" published in June that leading artificial intelligence (AGI) models still face difficulties in reasoning. Therefore, the race to develop general artificial intelligence (AGI) still has a long way to go.
The article points out that the latest updates of mainstream artificial intelligence large language models (LLM) (such as OpenAI's ChatGPT and Anthropic's Claude) have included large reasoning models (LRM), but their fundamental functions, extended features, and limitations "are still not fully understood."
The current evaluation mainly focuses on established mathematical and coding benchmarks, "emphasizing the accuracy of the final answer." However, researchers indicate that this assessment does not delve into the reasoning capabilities of AI models, starkly contrasting with the expectations that general artificial intelligence could be achieved in just a few years.
Researchers designed different puzzle games to surpass standard mathematical benchmarks to test the "thinking" and "non-thinking" variants of Claude Sonnet, OpenAI's o3-mini and o1, as well as DeepSeek-R1 and V3 chatbots.
They found that "state-of-the-art logical reasoning models (LRM) face a complete collapse in accuracy when exceeding a certain level of complexity," making it impossible to generalize reasoning effectively, and their advantages diminish as complexity increases, contrary to expectations regarding the capabilities of general artificial intelligence (AGI).
Source: Cointelegraph