Zhipu Founder Tang Jie: AI Large Models Show Rapid Improvement in "Ultimate Human Test" Capabilities

Deep News
01/10

At the AGI-Next Frontier Summit co-hosted by a key Beijing laboratory of Tsinghua University and Zhipu AI, Professor Tang Jie of Tsinghua University and founder of Zhipu pointed out that since 2025, AI large models have begun to show rapid improvement in their performance on the Human Level Examination (HLE), a highly challenging benchmark for evaluating intelligence.

Tang Jie noted that back in 2020, AI large models could only solve basic problems like MMU and QA. By 2021-2022, through post-training, they began to acquire mathematical reasoning capabilities (addition, subtraction, multiplication, division), addressing a fundamental reasoning gap. From 2023 to 2024, large models evolved from knowledge memorization to complex reasoning, starting to tackle graduate-level problems and real-world programming tasks like the SWE bench, mirroring the human progression from elementary school to the professional workplace. In 2025, model capabilities in the Human Level Examination are rapidly advancing; this test includes extremely obscure questions that cannot be retrieved via Google, requiring models to possess strong generalization abilities.

"There has always been a desire for machines (AI) to have generalization capabilities, where teaching them a little enables them to infer much more," Tang Jie stated. Although the generalization ability of AI today still needs significant improvement, Zhipu, and indeed the entire industry, is actively working to enhance it through a series of methods.

Around 2020, the industry, based on the Transformer architecture, strengthened models' long-term knowledge retention by scaling up data volume and computing power, enabling direct access to basic knowledge (such as answering "What is the capital of China?"). By around 2022, the industry began focusing on alignment and reasoning optimization to enhance complex reasoning abilities and intent understanding, with core methods involving the continuous expansion of supervised fine-tuning (SFT) and reinforcement learning, relying on vast amounts of human feedback data to improve model accuracy. By 2025, the field is starting to experiment with building verifiable environments, allowing machines to autonomously explore, acquire feedback data for self-improvement, and strengthen generalization capabilities, thereby addressing issues like high noise and limited scenarios inherent in traditional human feedback data.

免责声明:投资有风险,本文并非投资建议,以上内容不应被视为任何金融产品的购买或出售要约、建议或邀请,作者或其他用户的任何相关讨论、评论或帖子也不应被视为此类内容。本文仅供一般参考,不考虑您的个人投资目标、财务状况或需求。TTM对信息的准确性和完整性不承担任何责任或保证,投资者应自行研究并在投资前寻求专业建议。

热议股票

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10