The cloud computing industry is gradually shifting towards a "resource-selling" model, while major AI model providers are transforming into sellers of "Token fuel and outcomes." The recent price increase for Knowledge Atlas's GLM Coding Plan reflects a fundamental change in industry pricing logic: as inference consumption becomes a factor of production, model providers have the opportunity to convert "computing power scarcity" into gross profit and cash flow through tiered pricing and subscription-based products. In the short term, focus should be placed on marginal improvements driven by price increases and demand (Token "inflation"). Over the medium term, tracking corporate seat expansion and subscription retention will indicate renewal and growth potential. Long-term prospects are optimistic regarding the new market for "AI firewalls" driven by the adoption of governance tools. Key viewpoints are outlined below.
Recent developments: On February 12, Knowledge Atlas announced via official channels an increase in subscription prices for its GLM Coding Plan, with a minimum hike of 30%. Earlier this month, overseas cloud providers also implemented price adjustments; for instance, Google Cloud raised prices by up to 100% in North America, with similar increases in Europe and Asia, while AWS increased prices by approximately 15%. Overall, inflationary Token demand not only benefits cloud computing power but also enhances pricing power for model providers.
This trend disrupts the traditional internet free-access pathway. Conventional internet software typically follows a pattern of initially offering free services to build user scale, leveraging "user numbers and engagement time" to gain bargaining power, and subsequently monetizing through advertising, membership subscriptions, value-added services, and transaction fees. The underlying rationale for free offerings is the extremely low marginal cost—each additional user or click incurs minimal expense, diluted by bandwidth and storage scale effects, approaching near-zero marginal cost. The cloud computing era witnessed a similar "free/low-cost initial phase followed by expansion," but cloud billing soon evolved to CPU/storage/bandwidth/request counts, with customers acclimating to "pay-as-you-go" models. Clouds could charge because they delivered defined resources and Service Level Agreements. However, amid ongoing "model price wars" in the industry, Knowledge Atlas's price hike signals a shift in the large-model era's metric from traffic (DAU/engagement time) to Tokens (inference consumption), with Token usage becoming increasingly essential across diverse scenarios.
Transformation in the large-model era: Tokens have become "measurable factors of production," no longer "free traffic." Large models have transformed services like "dialogue/code generation/content creation," previously seen as software-provider offerings, into compute-intensive online inference services. For model providers, each response consumes tangible GPU resources, memory, bandwidth, and power. For users, tasks such as "allowing the model more thinking time, generating longer code segments, or executing more complex tasks" correspond directly to increased Token consumption, naturally establishing Tokens as the new unit of measure. Knowledge Atlas had previously implemented a "limited release" strategy for its Coding Plan due to computing power constraints driven by user growth, creating a classic supply-demand chain: surging short-term demand → rigid resource constraints (leading to throttling/limitations) → price increases. During peak congestion and resource scarcity, price hikes serve as a mechanism for model providers to filter demand, better preserving user experience compared to indiscriminate throttling. Furthermore, model providers' costs remain closely tied to GPU supply, utilization rates, and inference optimization; price adjustments and more rational tiered pricing can help rescue them from the "larger scale, greater losses" trap, improving gross margin and cash flow quality.
Token demand exhibits "inflationary" characteristics. "Token inflation" does not refer to Tokens themselves becoming more expensive but rather denotes a structural increase in Token consumption per unit of time per user. Several factors drive heightened Token demand: Shift from "Q&A" to "task execution": As models advance, users increasingly employ them for code refactoring, file modifications, document generation, and test execution. Programming scenarios inherently involve "long context, multi-iteration, high-volume output," resulting in substantial Token consumption. Knowledge Atlas's statements confirm that developers rely on its models for coding support, leading to rapid growth in Token usage. Transition from "single-turn" to "Agent multi-turn" interactions: Knowledge Atlas positions GLM-5 as a next-generation model for Coding and Agent scenarios; on February 12, MINIMAX-WP also launched its latest flagship programming model M2.5, marketed as the world's first production-grade model natively designed for Agent scenarios, directly comparing its Coding & Agentic capabilities to Claude Opus4.6. Agents proactively plan, retrieve, execute, and reflect, invoking models multiple times, with Token consumption accumulating step-by-step. Rising inference intensity: Increased "deep thinking and extended reasoning chains" significantly boost Token usage in outputs and intermediate processes. For developers, this often translates to higher success rates and reduced rework, making users willing to "burn more Tokens for efficiency." This signifies that Tokens are not akin to near-zero marginal cost "traffic" from the traditional internet era but essential "fuel" for production tasks.
Investment recommendations: Cloud computing is progressively becoming a "resource-selling" business, while large model providers evolve into sellers of "Token fuel and outcomes." Knowledge Atlas's GLM Coding Plan price increase mirrors a shift in industrial pricing logic: as inference consumption becomes a production factor, model providers can leverage "computing power scarcity" via tiered pricing and subscriptions to enhance profitability and cash flow. Future attention should focus on: Cloud providers and computing infrastructure: AI-driven IT expenditure and infrastructure investment remain in an upward cycle, with cloud segments benefiting from sustained growth in "companion consumption" like GPU computing power, storage, and network I/O. Large model providers: Their ability to maintain subscription retention and corporate seat expansion in high-ROI scenarios such as programming, Agent applications, and enterprise workflows—converting "Token usage" into delivered value through labor/time/rework savings—will determine their resilience against open-source alternatives and price competition. Security governance and runtime protection tools: As enterprises integrate AI into workflows, risks like data leaks and agent overreach will drive demand for "AI security/governance platforms" as essential layers. Short-term observation should center on marginal improvements from price hikes and demand (Token "inflation"); medium-term tracking should focus on renewal and expansion via corporate seats and subscription retention; long-term optimism is warranted for the new market arising from widespread adoption of governance tools, akin to "AI firewalls."
Risk warnings: Uncertainty in technological roadmap evolution; intensifying industry competition.