Alibaba's Resolve in Cloud Computing is Still Underestimated

Alibaba Cloud has undergone a comprehensive, full-stack reconstruction encompassing chips, cloud architecture, models, inference platforms, and product entry points. Over the past year, external discussions about Alibaba Cloud and Alibaba Group have largely centered on one thing: its unprecedented capital investment, heavily betting on AI infrastructure. Some observers even expressed confusion: Does building a cloud foundation for the AI era require so much money? Is Alibaba being overly aggressive, trying to boost its stock price with an AI capital narrative? These questions themselves are not wrong, but they imply a premise: still evaluating Alibaba Cloud with an old framework—looking at market share, growth rate, and the gap with AWS and Azure.

The answer provided by Alibaba Cloud at its May 20th summit lies outside this framework. Alibaba Cloud Senior Vice President Liu Weiguang stated that after Agents surpass a critical point, they can operate 24/7 without interruption, creating an endless demand for AI and cloud. Alibaba Cloud is undergoing a full-stack technological revolution, comprehensively upgrading from underlying chips and Agentic Cloud to models and inference platforms, aiming to build China's largest AI factory. Coincidentally, the core theme of Google's concurrent I/O conference was also Agents. Google is integrating Agents into all its core entry points, from search boxes and Chrome browsers to Android phones and smart glasses. Gemini is no longer just a conversational assistant but an AI agent that can run continuously and execute tasks across applications. AWS and Microsoft Azure are similarly reshaping their businesses and infrastructure foundations based on Agent logic.

The world's leading cloud providers, who are also top players in large models, have reached a tacit understanding: the old cloud cannot support future Agents; infrastructure needs to be rebuilt for Agents. Previously, most vendors followed a route of adding an Agent layer on top of existing architectures, with limited changes to the underlying infrastructure. Now, Alibaba Cloud is truly integrating cloud, chips, and models into a cohesive combination.

Understanding Alibaba Cloud's reconstruction hinges on a key judgment: the load characteristics of Agents and traditional cloud computing follow two completely different logics. Traditional cloud computing typically involves steady-state loads. An enterprise purchases an ECS instance to run a website or database, with relatively predictable traffic and long-term resource occupancy. Consequently, cloud providers' business models are designed around resource leasing, with computing, storage, and networking being the three pillars of the cloud business.

However, an Agent's operational mode is entirely different. When executing a task, an Agent might initiate dozens of model calls within milliseconds, destroy the environment immediately after task completion, and be reactivated minutes or even seconds later. Its load characteristics are irregular and bursty, spiking instantaneously within short lifecycles and then disappearing. Superficially, an Agent calls a model; in reality, it engages an entire AI full-stack system. It also requires a sandbox environment to run code, a database to store intermediate states, and network access to external tools. A single Agent task execution involves the coordinated scheduling of various resources like computing, storage, networking, and model inference. The complexity of cloud computing in the old and new eras differs by orders of magnitude.

Liu Weiguang mentioned that after this year's Spring Festival, following the launch of lobster-like Agent products, Alibaba Cloud observed an interesting phenomenon: previously, enterprises needed human operators to log into the console to manually activate cloud resources. Now, Agents automatically activate cloud computing resources in the background. "The cloud resource service provisioning that an Agent can complete in minutes might have taken us humans days to accomplish," Liu said. Cloud providers cannot afford to ignore this: Agents are becoming the new interface for cloud computing. Alibaba Cloud's conclusion is that the primary users of future cloud computing products will gradually shift from human engineers to Agents.

This judgment permeates Alibaba Cloud's entire reconstruction logic. To make the cloud truly usable by Agents, Alibaba Cloud has transformed its cloud products across three dimensions: Skill-ization, MCP-ization, and CLI-ization. Simply put, it turns each cloud product into a standardized capability module that Agents can call like functions. Traditional cloud product consoles are human-friendly but meaningless to Agents. Agents need structured capability descriptions and clear invocation protocols. This system is named "Agentic Cloud" by Alibaba Cloud, distinguishing it from the "AI Native Cloud" of recent years that served large model training and inference. The difference lies in that AI Native Cloud focuses more on model production and iteration, providing elastic and efficient computing power scheduling. Agentic Cloud targets the runtime of Agents, offering a full suite of capabilities including sandboxes, AI gateways, memory management, security protection, and orchestration governance.

Years ago, when cloud providers engaged in AI, they mainly sold computing resources to model companies for training and inference. Now, what Alibaba Cloud aims to do is make the cloud itself the operating system for Agent runtime.

If Agentic Cloud is Alibaba Cloud's answer at the architectural level, then chips are the physical foundation of this answer. At this summit, Alibaba Cloud announced its roadmap for self-developed chips. T-Head released the new-generation training-inference integrated AI chip, Zhenwu M890, featuring 144GB of video memory, 800GB/s inter-chip interconnect bandwidth, and performance three times that of the previous generation Zhenwu 810E. The accompanying ICN Switch 1.0 interconnect chip can combine 128 AI chips into a super-node server with P2P latency below 150 nanoseconds. According to reports, T-Head will successively launch the more powerful Zhenwu V900 and Zhenwu J900 chips over the next two years. This likely means Alibaba Cloud's chip iteration rhythm aligns with its model iteration rhythm, with each generation's performance improvement directly translating into leaps in large model training and inference capabilities.

Currently, the Zhenwu series AI chips have cumulatively shipped 560,000 units, serving over 400 customers across more than 20 industries, including telecommunications, automotive, and finance. Combined with Alibaba's self-developed Yitian series CPUs, Panmai smart network cards, and Zhenyue storage controller chips, Alibaba's chip portfolio has evolved from single-point breakthroughs to comprehensive coverage. Its data center chip matrix spanning computing power, networking power, and storage power is unique among domestic cloud providers. Liu Weiguang repeatedly emphasized the logic of chip-cloud-model-inference integration: "The final result presented to customers today is the combined effect of gear meshing—a complete organic integration of model capabilities, chip capabilities, and cloud capabilities."

Between chips and models, the Bailian inference platform acts as the "production workshop." Alibaba Cloud has built a large-scale GPU resource cluster on Bailian and addresses special challenges on the inference side through a technology stack tailored for Agent scenarios. Pooled scheduling unifies GPU resource management, improving overall utilization. Context caching eliminates redundant computation overhead for Agents in multi-turn dialogues and long-chain tasks. Throughput elasticity scheduling handles the peaks and troughs of concurrent Agent requests, ensuring the system doesn't crash during traffic surges and doesn't waste resources during low periods. More notably is the Agentic RL mechanism, which employs reinforcement learning based on Agents' actual execution feedback, enabling models to improve with use in real scenarios and form a continuously iterating closed loop. Additionally, Bailian has built-in security governance capabilities, which are extremely critical in the context of autonomous Agent operation. An Agent running tasks 24/7 without boundary constraints could have uncontrollable consequences. Bailian's security mechanisms ensure Agents always operate within preset permission boundaries.

Drawing an analogy to Google, the deep integration of Google's TPUs and Gemini models achieves the highest cost-effectiveness within its proprietary deep learning framework. This approach has gained high recognition in both technology and capital markets. Alibaba running its self-developed models on its self-developed chips, through deep software-hardware synergy, can also maximize the utilization of every computing unit on every chip.

Regarding the model segment, the newly released Qwen3.7-Max ranked first among domestic models and was close to the strongest models from GPT, Claude, and Gemini in the third-party Arena global large model blind test overall ranking. More compelling is a practical case: on the previously unfamiliar Zhenwu M890 chip, Qwen3.7-Max, relying solely on a task description, autonomously worked from scratch for 35 hours, independently completing the writing and tuning of a production-grade AI computing kernel, ultimately achieving a 10x performance improvement over the official version. No human intervention, no intermediate guidance, 35 hours, from zero to production-grade—this fully demonstrates the model's capability to "autonomously complete complex engineering tasks." The hardware foundation it ran on was precisely Alibaba's self-developed chip, concretely illustrating the synergistic evolution of chips and models in this case.

It's worth noting that the Qwen flagship model has iterated through three versions—3.5, 3.6, and 3.7—within the last three months. This release pace itself indicates that Alibaba is deliberately accelerating model evolution to match the exponential growth in model capability demands of the Agent era. Conversely, the speed of model iteration is ultimately constrained by the supply of computing power, bringing us back to the chip-cloud-model-inference system—a relationship of interlocking gears spiraling upward.

The reconstruction of the technical architecture must ultimately be understood in terms of business logic. Alibaba's financial report last week disclosed a key figure: the Annual Recurring Revenue (ARR) from AI model and application services has exceeded 8 billion yuan and is expected to surpass 30 billion yuan by year-end. On the day of the announcement, Alibaba's stock price surged 8%. Alibaba Cloud's internal assessment is even more radical: MaaS revenue driven by Agents will replace ECS as Alibaba Cloud's largest product line. Alibaba Cloud's business model is undergoing a fundamental change, with the growth engine shifting entirely from resource revenue measured in virtual machine units to AI revenue measured in Tokens.

The Bailian platform also follows an open ecosystem strategy, running not only Alibaba's self-developed Qwen model series but also simultaneously integrating third-party models like Zhipu GLM-5.1, MiniMax M2.7, Moonshot AI Kimi K2.6, and Vidu. Enterprise customers use multiple model combinations in real business scenarios. What Bailian aims to do is allow customers to find the best model combinations for each domain and the most cost-effective inference services on a single platform. As long as models are deployed on Alibaba Cloud, whether self-developed or third-party, they generate Token revenue. Liu Weiguang estimates that for AI-native startups, MaaS expenses constitute almost 100% of IT spending. Among Chinese internet companies, Token-related expenditures already account for 15%-20% of total IT spending. Traditional enterprises are still below 5%, but the growth curve is steep. Alibaba Cloud's internal target is that each enterprise customer's Token expenditure on Alibaba Cloud should be no less than 20% of that customer's total spending budget.

More specific changes are evident on the industry side. Selling Tokens actually expands Alibaba Cloud's business boundaries. Taking the automotive industry as an example, what Alibaba Cloud could previously do was help automakers migrate systems like ERP to the cloud, later extending to computing power and cloud foundations for intelligent driving, and then to in-cabin large model dialogue. Currently, tasks like customer marketing and ad generation, which were once completely outside the business scope of cloud providers, have become new revenue sources due to the spillover of AI capabilities. Even Alibaba Cloud internally did not anticipate securing budget for these parts of clients' businesses. Another market previously inaccessible to cloud providers is internal software development and human outsourcing within enterprises. This budget was long taken by system integrators and outsourcing companies, with cloud providers having no entry. However, the emergence of AI Coding has turned this budget into Token expenditure. Such changes collectively mean the ceiling of the cloud computing industry has been significantly raised. Previously, the revenue limit for cloud providers depended on the portion of enterprise IT budgets that could be migrated to the cloud—databases, middleware, big data platforms—primarily involving migration of existing markets, also creating some new cloud-native demands. AI has pulled enterprise expenditures on internal operations management, marketing, and software development—areas originally not considered "IT infrastructure"—into the revenue pool of cloud providers.

One more thing: For the first time in its 17-year history, Alibaba Cloud has launched another official website, the QWen Cloud website, essentially a response to the new business logic. Opening www.qianwenai.com reveals not the traditional cloud website's product list and console. The homepage contains only one line of code instruction: `npx skills add QianWen-AI/qianwen-ai`. This is an Agent-readable prompt instruction. Alibaba Cloud has encapsulated the core capabilities of all model services into standardized Skills and CLI tools that Agents can directly parse and invoke. When the primary consumers of the cloud are no longer humans but Agents, all interfaces, processes, and interaction logic designed for humans need to be rewritten. The last time a leading Chinese tech company reconstructed its product entry point with such determination might date back to the early days of mobile internet, when everyone shifted traffic from PC websites to apps. This time, however, it's more thorough: at least apps require humans to open them, whereas an Agent only needs to read an instruction.

Returning to the initial question: How should Alibaba's resolve in cloud computing be evaluated? From a capital investment perspective, the trillion-level investment in AI infrastructure is indeed unprecedented. But focusing solely on the investment amount misses more fundamental changes. Alibaba Cloud has undertaken a thorough, full-stack reconstruction from chips, cloud architecture, models, and inference platforms to product entry points. The bet behind this is that when Agents become the primary consumers of the cloud, whoever completes the infrastructure reconstruction first, achieves better performance and higher cost-effectiveness, secures the ticket for the next decade. The world's leading cloud providers—whether Google, AWS, Microsoft, or Alibaba Cloud—have all made the same choice. This is not the adventure of a single company but the consensus of an entire industry. What Alibaba Cloud wants to do now is charge ahead at the forefront.

免責聲明：投資有風險，本文並非投資建議，以上內容不應被視為任何金融產品的購買或出售要約、建議或邀請，作者或其他用戶的任何相關討論、評論或帖子也不應被視為此類內容。本文僅供一般參考，不考慮您的個人投資目標、財務狀況或需求。TTM對信息的準確性和完整性不承擔任何責任或保證，投資者應自行研究並在投資前尋求專業建議。

老虎證券

Alibaba's Resolve in Cloud Computing is Still Underestimated

熱議股票