2025 has been dubbed the "Year of AI Agents" by AI industry professionals. However, while OpenAI's ChatGPT Agent and star startup Manus have generated global excitement with their intelligent agents, their practical convenience remains limited. Four major obstacles continue to stand between users and ideal AI Agents: subpar task delivery quality, slow response times, inability to intervene mid-process, and insufficient expertise. Manual intervention to salvage AI-generated work often becomes the final resort.
This represents both the current predicament of AI Agents and the opportunity that Wang Ying, head of Baidu Library and Baidu Cloud Storage divisions, has identified. Since the AI large model explosion in late 2022, both Library and Cloud Storage have progressively undergone AI reconstruction. Two years later, these platforms have evolved from document viewing and storage platforms into comprehensive content acquisition, creation, and service platforms, earning recognition from Baidu founder Robin Li as "the most thoroughly AI-reconstructed products."
On August 18, Library and Cloud Storage jointly launched the universal AI agent "GenFlow 2.0." This represents the world's first fully universal Agent accessible across all endpoints. Users can directly access it through Baidu Library's web interface and mobile app without invitation codes or testing queues.
AI Agent translates most directly to AI assistant, helping humans share workload by understanding and completing human instructions. GenFlow 2.0 operates through AI thinking and planning, autonomously calling upon various models and agents for PPT, documents, mind maps, posters, and more, ultimately delivering multimodal content to users.
For example, when users input "design a Sanrio blind box," the system instantly outputs 3D modeling images of Hello Kitty and Cinnamoroll characters. Ask it to "create a 7-day, 6-night suburban Beijing travel plan for National Day holiday," and it generates a PDF embedded with maps of recommended attractions. Users can also directly access their personal Baidu Library or cloud storage materials for customized company research reports.
According to live demonstrations, GenFlow 2.0 operates like having over 100 professional agents forming an "expert team" working simultaneously, completing 5-6 complex tasks in parallel within minutes.
"GenFlow 2.0 is not just a multi-agent collaborative scheduling system; it's more like a human AI expert team," Wang Ying emphasized during the launch event.
Addressing prevalent pain points in the current AI Agent industry, the Library and Cloud Storage teams have made technical and product breakthroughs in four key areas over the past six months:
First, tackling "slow response times and excessive user wait times," GenFlow 2.0 employs a proprietary Multi-Agent infrastructure enabling parallel task processing rather than serial workflows. Traditional Agents function like assembly lines: task B only begins after task A completes. GenFlow's Multi-Agent infrastructure allows hundreds of experts to work simultaneously. For instance, when generating an industry research report, data collection agents, chart creation agents, competitive analysis agents, and PPT formatting agents launch synchronously, functioning like workers "simultaneously assembling express packages" on an assembly line, ultimately compressing completion time to minutes.
Second, addressing "subpar task delivery quality and insufficient output expertise," GenFlow 2.0 supports calling upon 100+ multimodal agents forming an "AI expert team," capable of parallel generation of PPT presentations, research reports, video storybooks, posters, images, charts, HTML, code, games, websites, and other multimodal content. The Library team indicates they annotate all schedulable agents, including their specialized capabilities, user adoption rates of task results, and user copying and downloading behaviors, establishing system weightings that influence future agent activation probabilities. Previously, Baidu Library's "PPTAgent" achieved over 34 million global monthly visits.
Third, addressing "inability to intervene mid-process," GenFlow 2.0 enables real-time intervention in thinking processes. "We discovered users' greatest frustration: watching AI 'think incorrectly' without being able to correct it mid-course," Wang Ying stated. Therefore, GenFlow 2.0's solution involves revealing AI thinking processes while allowing human intervention. GenFlow 2.0 achieves industry-first capability for full-process task execution that can be paused, interrupted, and supplemented with additional instructions. When users notice an agent's thinking direction deviating from their intent, they can immediately intervene and adjust rather than waiting for completion and reworking due to dissatisfaction.
Simultaneously, GenFlow 2.0 can deeply understand user intent and autonomously switch collaboration modes. For simple questions like "what is 999 squared" or "what concerts are in Shanghai in August," GenFlow 2.0 can call upon just one agent for direct answers, avoiding "using a sledgehammer to crack a nut." For complex tasks, users can simultaneously input multimodal file requirements, and GenFlow 2.0 can autonomously think and schedule appropriate multi-expert agent collaboration to meet task demands.
In developing GenFlow 2.0, the Library and Cloud Storage team established a direct standard: user satisfaction. "Agents exist to complete work. Whether users are satisfied, whether they adopt the results, whether they pay—this provides an excellent standard. Current technical and team capabilities aren't the issue; the key is not having an oversized ego. If you believe user needs don't exist and your needs represent user needs, that's problematic."
Currently, GenFlow 2.0 has comprehensively integrated Baidu's ecosystem resources. For example, after user authorization, it can retrieve and call upon designated materials stored in Baidu Cloud Storage; for travel guides or address searches, it can utilize Baidu Maps tools; for academic research tasks, it supports deep web searches, directly accessing Baidu Academic's 680 million literature database and Library's 1.4 billion professional content repository.
GenFlow 2.0 stems from the Library's "AI reconstruction" initiative launched two years ago. "The Library team's strategic direction was building a productivity closed loop for the AI era," Wang Ying previously stated.
For users, if Agents represent production lines, producing results still requires data, documents, PPTs, and other production materials. These represent the accumulated strengths of Baidu Library and Cloud Storage teams.
"Before AI, we were simply a document search platform. Users' creation chains were lengthy—from ideation to information gathering, integration, editing, and completion. We only handled one segment, but this track had limited space, just a market worth billions. With large models, we gained capability to serve users' complete workflows," Wang Ying explained.
Following AI large model reconstruction, Baidu Library positions itself as a "one-stop AI content acquisition and creation platform," possessing over 1.4 billion professional content resources and launching hundreds of multimodal AI agents including intelligent PPT, intelligent writing, AI storybooks, industry research reports, AI web search, intelligent posters, intelligent comics, intelligent novels, and super contracts.
Simultaneously, Baidu Cloud Storage has upgraded to a "one-stop content service platform," launching AI notes, simple audio transcription, simple scanning, simple printing, and other AI functions covering user needs across entertainment, learning, office work, and family education scenarios. Currently, Baidu Cloud Storage serves over 1 billion users with over 200 million monthly active users, over 80 million AI monthly active users, and total storage space exceeding 100 billion GB.
According to Wang Ying's previous disclosure, Baidu Cloud Storage and Library's jointly launched multimodal notes achieved millions of daily active users within just 20 days, while their AI camera took only 9 days to complete.
The combination of Baidu Library and Baidu Cloud Storage creates an exclusive resource pool of "professional public domain data + authorized private domain data," ultimately serving as raw materials for Agent production lines.
In April this year, Baidu Library and Baidu Cloud Storage launched the content operating system "Cangzhou OS," which Robin Li called "the world's first operating system in the content domain," launching "GenFlow 1.0" based on this system.
"Cangzhou OS is an extremely complex production line with numerous tools. But how it initiates work after users issue commands, which tools work together to meet your needs—that's handled by GenFlow's scheduling system," Wang Ying explained.
Wang Ying likens their collaborative logic to "Transformers stacking"—users only need to communicate with the unified interface "Optimus Prime"—GenFlow 2.0—while hundreds of professional "Transformers" collaborate and transform behind the scenes.
When AI Agents can solve problems end-to-end, their industry value shifts from singular "functional points" to direct "delivery results." Simultaneously, AI Agents evolve from individual tools to reconstructors of production relationships.
In April 2025, Baidu announced full compatibility with MCP protocol (Model Context Protocol). MCP is an open standard introduced by Anthropic to unify communication between large language models and external data sources and tools. MCP establishes "common language" and "communication rules" for all agents, also called the TCP/IP protocol, HTTP standard, and universal interface for AI Agent interaction.
GenFlow 2.0 also supports MCP protocol, enabling flexible integration with third-party service ecosystems. Currently, Honor has become the first hardware manufacturer to integrate with the MCP ecosystem, deeply integrating GenFlow 2.0 into Magic OS systems. Users can access personal cloud storage and library resources through Honor phone assistant YOYO with one click, achieving native system-level experiences for file summarization, PPT generation, travel planning, and other scenarios.
In Baidu Library and Cloud Storage teams' vision, future AI should be "omnipotent and omnipresent." Whether on phones, tablets, IoT devices, or broader life scenarios, users can call upon AI Agent collaboration anytime.
More critically, as AI Agents continue expanding across ecosystems and scenarios, ordinary people's creative barriers will decrease from professional commands to natural daily conversation, opening more commercial growth opportunities.
As Robin Li stated at the 2025 Baidu Create conference: "Now, developing intelligent agents based on MCP is like developing mobile apps in 2010."
Reports from EO Intelligence show the AI Agent market continues expanding, projected to grow from 57.4 billion yuan in 2023 to 3.3009 trillion yuan in 2028.
The era where "everyone is a super individual" may be accelerating with GenFlow 2.0.
"When users discover freer expression, intervening processes, and more reliable delivery, AI truly becomes a trustworthy collaborative partner," Wang Ying describes the ultimate coexistence between humans and AI.