The cloud computing market is experiencing a wave of price hikes. Following signals of price increases from Amazon AWS, Google Cloud, and Tencent Cloud, Alibaba Cloud and Baidu Cloud both announced price adjustments on the same day.
On March 18, Alibaba Cloud announced via its official website that due to the global surge in AI demand and rising supply chain costs, the prices of its AI computing power and storage products, among others, will be increased, with a maximum increase of 34%. Specifically, computing card products, such as the T-Head Zhenwu 810E, will see price increases ranging from 5% to 34%, while the file storage product CPFS (AI Computing Edition) will increase by 30%.
On the same day, Baidu AI Cloud issued an announcement regarding price adjustments for its AI computing power and storage products. It stated that due to the rapid global development of artificial intelligence applications, computing power demand continues to climb. Costs for core hardware and related infrastructure have seen significant increases. To ensure the long-term stable operation and service quality of the platform, a structural optimization of pricing for some products is planned.
With these moves, major domestic and international cloud service providers have largely joined the price increase trend.
The wave of cloud service price hikes has spread globally since early 2026. Both Amazon AWS and Google Cloud announced price increases for some services at the beginning of the year. Notably, Google Cloud made significant adjustments to the pricing of data transfer services like CDN Interconnect, Direct Peering, and Carrier Peering, with prices in North America increasing by 100%.
Domestically, Tencent Cloud announced price increases for its large model services on March 11. Tencent Cloud stated that to continue providing stable and high-quality services, it has adjusted the billing strategies for some models on its Intelligent Agent Development Platform. Taking the Tencent HY2.0 Instruct model as an example, its input price has been significantly raised from the original 0.0008 RMB per thousand tokens to 0.004505 RMB per thousand tokens, an increase of 463.13%.
On March 16, Zhipu announced the launch of its base model GLM-5-Turbo, designed for the open-source agent framework OpenClaw (also known as "Lobster"). This is its first recently released closed-source model. Simultaneously, Zhipu increased the API price for the new model by 20%, marking its second recent price hike. Rough calculations indicate that GLM-5-Turbo prices have increased by an average of 83% compared to GLM-4.7, with API prices nearly doubling, showing a clear trend of increasing both volume and price.
Analysis also reveals that "Data Transfer and Networking" items have become the most affected category in this round of cloud service price increases, with related service hikes generally concentrated between 10% and 40%. Leading providers, including Amazon AWS, Google Cloud, Microsoft Azure, Tencent Cloud, and Wangsu Technology, have all included data transfer or network-related services in their price increase scope. This trend indicates that cloud providers are passing on the rising costs of bandwidth and network infrastructure to users.
Furthermore, extreme cases exist within this pricing adjustment wave; for instance, the price of specific network services in Google Cloud's North America region directly doubled. Regarding the timing of adjustments, major providers are moving in quick succession, with almost one leading company announcing a price increase each month, demonstrating a clear industry follow-the-leader effect.
Tight computing power supply is forcing cloud providers to re-evaluate their pricing. Wang Zhijie, Product Director at CDN service provider Wangsu Technology, stated that the phase of cloud computing price wars has ended, and the industry is entering a cycle of value return, marking a shift from "scale-first" to "profit-first" rational pricing.
He emphasized that this round of price increases is not merely a simple price cycle fluctuation but a passive market response to the imbalance between AI computing power supply and demand. On the supply side, costs for GPUs, storage, bandwidth, and power are rising rigidly with accelerated iteration, while on the demand side, AI training and inference demands are exploding, highlighting resource scarcity.
Regarding demand structure, Wang Zhijie observed that from 2025 to the first quarter of this year, the market overall exhibited characteristics of "stable training demand and exponentially growing inference demand." Industry data shows that monthly large model API call volume has grown at a rate of 30% month-over-month, while video generation and real-time interactive applications continue to drive a surge in edge inference computing power demand.
"Traditional cloud services follow a cost-reduction path based on 'Moore's Law + economies of scale,' but the marginal cost of AI computing power actually increases with scale, leading providers into a paradoxical situation where 'selling more leads to losing more'," said an industry practitioner. Under pressure to survive, the industry is repairing profit margins through structural price increases, which is a sign of the industry maturing.
Simultaneously, the demands that AI applications place on underlying infrastructure are rapidly escalating. The aforementioned practitioner explained that the challenge platforms face is no longer just having resources, but also being able to schedule them efficiently. This includes managing hybrid scheduling of heterogeneous resources like CPUs, GPUs, and FPGAs, supporting seamless migration of AI tasks between edge, core, and cloud, compressing large model edge loading latency, and addressing the pressure from liquid cooling and power supply upgrades required by rapidly increasing single-rack power densities.
"Network transmission is the second largest cost item for cloud providers, after computing. It is an inevitable trend for CDN services to follow suit with price increases," Wang Zhijie stated. In the AI era, low-latency inference services must rely on intelligent interconnection between edge nodes and central cloud. Platforms also face higher requirements regarding content compliance, security, stability, and elasticity. In his view, the role of CDN has fundamentally transformed—"evolving from content distribution to a distributed computing power scheduling network integrating transmission, computing power, and inference."
The Agent trend is adding fuel to the fire. Looking at Alibaba Cloud's recent price hike, another key driver is the "surge in Token consumption." According to insiders, Alibaba Cloud's MaaS business platform, Bailian, achieved its highest historical growth rate from January to March this year. Alibaba Cloud is prioritizing its scarce AI computing power resources towards Token inference business.
Trend-wise, the latest AI models are performing more "thinking," especially in areas like deep research, AI Agents, and code generation. Consequently, although the price per Token continues to decrease, the number of Tokens required to complete multiple tasks is rising sharply.
Observations indicate that as the open-source AI Agent framework OpenClaw gains rapid popularity among developer communities, AI applications are shifting from chatbot forms towards Agents capable of long-term operation and executing complex tasks.
Data from OpenRouter, the world's largest AI model API aggregation platform, shows that OpenClaw's Token consumption surged from 80.6 billion on February 3, 2026, to 358 billion by March 4, an increase of approximately 3.4 times within a month. By the week of March 2, the weekly Token call volume on the OpenRouter platform had reached 14.8 trillion, a growth of about 160% within two months, with OpenClaw contributing the vast majority. According to data from Anthropic, Token consumption for AI Agents can be up to 15 times higher than for ordinary chat interactions.
On March 17, during the GPU Technology Conference (GTC) 2026, NVIDIA CEO Jensen Huang stated that when AI Agents work, a single task often requires repeated calls to the inference capabilities of multiple models and tools, leading to an order-of-magnitude increase in Token consumption.
Huatai Securities pointed out that the accelerated release of Claw-like products may drive faster evolution of Agents, also boosting Token consumption, inference computing power demand, and related infrastructure investment.
The popularity of "Lobster" is further catalyzing cloud resource tightness. As Lobster's Token consumption grows exponentially, the corresponding consumption of underlying computing power and bandwidth is also expected to experience explosive growth. This could widen the magnitude of industry price increases or bring forward the timing of future hikes.
According to IDC predictions, as AI Agents handle increasingly complex tasks, with their inference depth and call chains continuously extending, underlying Token consumption is expected to leap by orders of magnitude. Data shows that annual Token consumption is forecast to surge from 0.0005 Peta Tokens in 2025 to 152,667 Peta Tokens by 2030, representing a compound annual growth rate of 3418%.
"Facing the exponential growth in Token consumption, cost and energy consumption will become key constraints. Enterprises need to make forward-looking plans regarding computing resources, model selection, and configuration," suggested Sun Zhenya, Senior Research Manager at IDC China.