Zhipu Founder Tang Jie: AI Large Models Show Rapid Improvement in "Ultimate Human Test" Capabilities

At the AGI-Next Frontier Summit co-hosted by a key Beijing laboratory of Tsinghua University and Zhipu AI, Professor Tang Jie of Tsinghua University and founder of Zhipu pointed out that since 2025, AI large models have begun to show rapid improvement in their performance on the Human Level Examination (HLE), a highly challenging benchmark for evaluating intelligence.

Tang Jie noted that back in 2020, AI large models could only solve basic problems like MMU and QA. By 2021-2022, through post-training, they began to acquire mathematical reasoning capabilities (addition, subtraction, multiplication, division), addressing a fundamental reasoning gap. From 2023 to 2024, large models evolved from knowledge memorization to complex reasoning, starting to tackle graduate-level problems and real-world programming tasks like the SWE bench, mirroring the human progression from elementary school to the professional workplace. In 2025, model capabilities in the Human Level Examination are rapidly advancing; this test includes extremely obscure questions that cannot be retrieved via Google, requiring models to possess strong generalization abilities.

"There has always been a desire for machines (AI) to have generalization capabilities, where teaching them a little enables them to infer much more," Tang Jie stated. Although the generalization ability of AI today still needs significant improvement, Zhipu, and indeed the entire industry, is actively working to enhance it through a series of methods.

Around 2020, the industry, based on the Transformer architecture, strengthened models' long-term knowledge retention by scaling up data volume and computing power, enabling direct access to basic knowledge (such as answering "What is the capital of China?"). By around 2022, the industry began focusing on alignment and reasoning optimization to enhance complex reasoning abilities and intent understanding, with core methods involving the continuous expansion of supervised fine-tuning (SFT) and reinforcement learning, relying on vast amounts of human feedback data to improve model accuracy. By 2025, the field is starting to experiment with building verifiable environments, allowing machines to autonomously explore, acquire feedback data for self-improvement, and strengthen generalization capabilities, thereby addressing issues like high noise and limited scenarios inherent in traditional human feedback data.

免責聲明：投資有風險，本文並非投資建議，以上內容不應被視為任何金融產品的購買或出售要約、建議或邀請，作者或其他用戶的任何相關討論、評論或帖子也不應被視為此類內容。本文僅供一般參考，不考慮您的個人投資目標、財務狀況或需求。TTM對信息的準確性和完整性不承擔任何責任或保證，投資者應自行研究並在投資前尋求專業建議。

老虎證券

Zhipu Founder Tang Jie: AI Large Models Show Rapid Improvement in "Ultimate Human Test" Capabilities

熱議股票