CPU's Evolving Role in the AI Inference Era

The rise of Agentic AI is prompting a re-evaluation of the CPU's role in the AI inference era. This analysis explores the drivers behind increasing CPU demand, provides market projections, and examines the competitive landscape to assess the future potential for CPUs. While the optimal CPU-to-GPU ratio is not yet definitive in the short term, the long-term trend points towards a more significant role for CPUs.

Summary Why is CPU demand growing? During the large model training phase, GPU-centric matrix computational power is the core determinant of model capability, with industry focus on optimizing GPU performance metrics like FLOPs. However, since the second half of 2025, two key shifts have emerged: 1) On the training side, the growing importance of reinforcement learning has made the CPU/GPU ratio a key system metric; 2) On the inference side, three primary areas are driving demand: a) Inference Host CPUs, which work alongside GPUs for task scheduling and utilization optimization, and can even handle simple inference tasks, acting as a partial substitute for GPUs; b) Orchestration Node CPUs, which act as independent nodes handling logical operations and task orchestration in complex agentic workflows; c) The Sandbox execution layer, where increased concurrent tasks drive demand for more CPU resources.

How large is the CPU market driven by AI? We attempt to estimate from two perspectives: 1) A neutral GPU-based estimate: Assuming a CPU-to-GPU ratio of 1:1, we project the global CPU market size could exceed $130 billion by 2030; 2) Under current Agentic AI usage scenarios (e.g., 500 million daily active users or 30 billion daily tokens), we estimate the incremental demand for CPUs could be around 8.4 million units. Technologically, the "scheduler" CPU, akin to a new operating system, is trending towards stronger single-core performance, greater memory bandwidth, enhanced I/O capabilities, and higher core counts. Long-term, we expect data center CPU evolution to focus on three main themes: enhanced data bandwidth capabilities, task specialization, and deeper integration with accelerators. Furthermore, given rapidly growing demand, we believe the server CPU price increase trend could persist into 2026.

Competitive Landscape: x86 vs. Arm, who will prevail? Currently, Arm holds less than a 20% share in the global server CPU market, which is still dominated by x86 architecture. Considering that Agentic products feature high concurrency, continuous operation, and numerous lightweight inference requests (e.g., multi-turn dialogues, tool calls, planning), Arm's RISC-based architecture offers power efficiency advantages, supporting more cores to handle concurrent requests, making it suitable for high-throughput inference serving. We anticipate its market share may increase in the future.

Risks CPU demand falling short of expectations, intensifying market competition, and tight upstream production capacity.

Main Text As inference demand continues to rise, discussions are heating up about the server system shifting its core from GPU-centric matrix computation to emphasizing the importance of CPUs for tasks like orchestration. We believe that in the long run, heterogeneous systems within servers will become the trend. This article focuses on four key questions: 1) From a demand perspective, what is driving the current increase in CPU demand? 2) Analogous to changes in storage, what is the current state of supply and demand in the CPU market? 3) From a long-term view, what are the future development trends for CPUs? 4) What is the competitive landscape of the CPU market?

Why is CPU demand growing? During the training phase of large models, GPU-centric matrix computational power is the core determinant of model capability, with industry focus on optimizing GPU performance metrics like FLOPs. However, since the second half of 2025, two key shifts have emerged: 1) On the training side, the growing importance of reinforcement learning has made the CPU/GPU ratio a key system metric; 2) On the inference side, three primary areas are driving demand: a) Inference Host CPUs, which work alongside GPUs for task scheduling and utilization optimization, and can even handle simple inference tasks, acting as a partial substitute for GPUs; b) Orchestration Node CPUs, which act as independent nodes handling logical operations and task orchestration in complex agentic workflows; c) The Sandbox execution layer, where increased concurrent tasks drive demand for more CPU resources.

Training Perspective: Reinforcement Learning Boosts CPU Demand Reinforcement learning introduces new requirements for the CPU/GPU ratio. Unlike the traditional view that GPUs are the sole critical metric in the training phase, the rising importance of reinforcement learning makes considering CPU resource constraints a notable direction. In current reinforcement learning practice, environment interaction and hardware resource allocation have become major system bottlenecks. As running simulation environments requires substantial CPU resources, insufficient CPUs can lead to GPU idle time. Therefore, rationally designing the CPU/GPU ratio, such that the number of CPU threads equals or exceeds the number of GPU SMs, has become an important consideration. However, in absolute terms, we judge that the CPU demand driven by reinforcement learning is relatively limited compared to inference.

Inference Perspective: The CPU Becomes a Bottleneck in the Agentic AI Era Simple Inference: Cost Perspective and CPU Substitution for GPUs From a cost perspective, CPUs hold some potential to substitute for GPUs. Currently, GPUs remain in tight supply, with no significant easing in rental prices or availability for high-performance models. While other compute chips are difficult substitutes for GPUs in training scenarios, the inference scenario presents different dynamics. Performance requirements for matrix computation are lower, and for simple inference tasks like chatbots, the industry has begun using lower-FLOPS compute chips like the RTX series for inference. Given the significant price advantage of CPUs over GPUs, we believe that for some simple inference tasks, CPUs could partially substitute for GPUs, providing some demand pull. This aligns with the push by leading CSPs for ASIC chips and their exploration of customized CPU chips. However, this portion is difficult to quantify precisely and has limited potential space.

Agentic AI: Rising Token Consumption Share and Complex Task Orchestration Make CPUs the New Bottleneck We believe the CPU's role in the Agentic AI era has three key characteristics: 1) Overall: Increased importance due to complex task chains and workflows; 2) Workload Complexity: Different workloads place varying demands on CPUs. In workloads like RAG and ChemCrow, the CPU has already become a core bottleneck; 3) Increased Concurrency: Higher concurrency further intensifies the CPU bottleneck constraint, driving demand for sandboxes at the execution layer. Agentic AI, with its broader application capabilities, is gradually becoming mainstream. Building on generative models, it adds orchestration, memory, and goal-directed behavior, enabling multi-step task planning, tool invocation, result iteration, and operation within longer workflows. According to OpenRouter data, by the end of 2025, inference-generated tokens exceeded 50% of total tokens, with 15% of inference processes ending with an "external tool call." Multi-step, multi-tool Agentic AI leads to more complex task flows. From a workflow perspective, traditional generative AI (like single-turn LLM dialogues) involves relatively simple input-output processes with fewer steps. As AI evolves towards Agentic (intelligent agent) systems, inference processes involve more intricate steps, frequent use of different tools, and external API calls, elevating the importance of the CPU as the core orchestrator. The emergence of tool processing requirements in Agentic AI tasks has made the CPU a new bottleneck in certain workload scenarios. Under a typical LLM execution pattern, the inference flow is: Inference 1 -> Tool Call 1 -> Inference 2 -> Tool Call 2 -> Inference 3... Because the system must wait for the LLM to generate all tokens for a complete tool call before execution begins, it can lead to GPU idle time (waiting for tool results) and tool idle time (waiting for model-generated instructions). This necessitates CPU-based tool processing, significantly elevating the CPU's importance for tool handling alongside the GPU-centric core. Research, such as the paper "A CPU-CENTRIC PERSPECTIVE ON AGENTIC AI," illustrates latency performance across different workloads, showing that tool processing on the CPU can constitute a large portion of end-to-end delay, shifting industry optimization focus towards CPU-centric strategies. From a dynamic perspective, CPU over-subscription becomes more severe as the number of concurrent tasks increases. With larger batch sizes and longer input/output token lengths, the impact of the CPU as a bottleneck gradually grows across different workloads. In other words, an increase in users or concurrent tasks demands higher CPU core counts. For instance, when the batch size reaches 128, the system needs to schedule hundreds of tool execution processes simultaneously, making CPU core count the new limiting factor. Increasing CPU resources shows clear benefits in reducing latency and improving system utilization efficiency.

Sandbox Execution Layer: Driving Growth in Multi-core Concurrency and Hardware Virtualization Demand Complex Agent tasks are driving rapid growth in sandbox demand. In enterprise application scenarios, to ensure system security and execution environment purity, the system typically spins up and subsequently tears down an independent micro-virtual machine or container (MicroVM/Container) – a sandbox – for each external tool call request. Based on current task classifications, aside from a few read-only, pure API call, or pure local debugging tasks, tasks involving autonomous code execution or calling external tools require sandboxes for their advantages in system security, efficiency control, and environment consistency. Sandboxes consume CPU hardware virtualization instruction set performance (like Intel VT-x/AMD-V) and impose a linearly increasing demand for physical CPU cores. When a system faces concurrent execution of dozens of tasks like web scraping, code compilation, or data cleaning, a large number of physical cores is the only solution for horizontal scaling and reducing context-switching overhead between tasks, thus placing higher demands on CPU scheduling capability. The number of CPU cores determines how many sandboxes (parallel environments) can be activated.

How large is the CPU market driven by AI? As discussed, training and simple inference contribute secondary demand for CPUs; the core driver is Agentic AI. With the rapid development of Agentic AI and the increasing proportion and complexity of multi-step inference tasks, requirements for task orchestration and scheduling continue to rise, leading to increased CPU demand and discussions about future GPU-to-CPU ratio changes. This section attempts estimations from two dimensions.

Ratio Perspective: Agentic AI Drives Rapid Growth in Server CPU Market Size We project the global CPU market size could exceed $130 billion by 2030. The ratio of CPUs within a single AI server is set to increase, enhancing system marginal efficiency. From the demand analysis in the first section, it's clear that the quantity and core count of CPUs in past AI servers are no longer sufficient. To maintain overall high system throughput, server architecture needs to significantly boost CPU core counts and cache performance, leading to a higher share of CPU costs within total compute procurement. Therefore, we anticipate growth in CPU demand. The specific ratio number is crucial. Focusing on recent statements from major CPU vendors: 1) Intel: CEO mentioned in the 1Q26 earnings call that the CPU-to-GPU ratio could improve from 1:8/1:4; 2) AMD: CEO projected the global server CPU market could reach $120 billion by 2030 during the 1Q26 earnings call; 3) Arm: CEO projected the global server CPU market could exceed $100 billion by 2030 during the 4Q26 earnings call. It must be noted that there is no consensus on the optimal GPU-to-CPU ratio. Given existing, relatively fixed server architectures, we expect inference servers will still primarily be configured with 1 CPU per 2 GPUs. However, considering Agentic AI demand, pure CPU racks will begin deployment. Therefore, from a comprehensive view, the CPU-to-GPU ratio will gradually increase from the current 1:4 ratio in 8-GPU servers, potentially reaching 1:1 or higher by 2030. Based on a neutral 1:1 ratio projection, we estimate the global CPU market size could exceed $130 billion by 2030. Core assumptions include: 1) Global compute chip shipments reaching 42.4 million units by 2030; 2) AI server CPU-to-GPU ratio reaching 1:1 by 2030; 3) AI server CPU unit price increasing by 16% from 2026 to 2030 due to higher core counts, improved performance, and foundry upgrades.

Demand Perspective: Agentic AI Presents New Requirements for CPU Quantity and Core Count Total Volume: Current Agentic AI Scenario Drives Incremental Demand for Over 8 Million CPUs We estimate that under current usage scenarios, Agentic AI drives incremental demand for approximately 8.4 million CPUs. Estimating CPU demand from the user side is complex. We simplify by using concurrent task numbers to discuss Agentic AI's pull on CPUs. The core calculation logic is: 1) Estimate concurrent task numbers based on daily active users or daily token consumption; 2) Allocate core parameters (task proportion, cores occupied, number of agents invoked) based on task complexity; 3) Calculate required CPU cores under four scenarios; 4) Estimate the required number of CPUs.

Structure: Further Refinement of CPU Application Scenario Requirements From a technology development trend perspective, the "scheduler" CPU, akin to a new OS, is upgrading primarily towards: 1) Stronger single-core performance to reduce per-inference latency; 2) Greater memory bandwidth and stronger I/O capabilities to manage longer contexts and massive data; 3) More cores to support high-concurrency queries and virtualization.

Price Trend: Short-term Supply-Demand Imbalance, Potential for Continued Server CPU Price Increases Due to some ambiguity in CPU capacity allocation, precise supply-side estimates are lacking. However, from a qualitative perspective, demand pull from Agentic AI and other factors continues to grow, leading to some degree of shortage and price increases in the CPU market. Influenced by the supply-demand gap, we believe the server CPU price increase trend could persist into 2026. As of May 2026, we observed Intel server CPUs underwent two price increases in February and March, ranging from 5-15%, with lead times for some models continuing to lengthen, reflecting growing demand. ► On the demand side, as mentioned, growth is primarily driven by AI inference demand, coupled with replacement demand for general-purpose servers. We expect global server shipments to grow nearly 20% year-over-year in 2026, with future Agentic AI potentially accelerating demand for AI and supporting servers. ► On the supply side, both AMD and Arm utilize TSMC's advanced process nodes. Combined with continuously revised upward demand for compute chips like GPUs and ASICs, TSMC's 2-5nm process orders are robust, with limited capacity expansion. We expect the supply-demand gap to persist into 2027. Currently, AMD's CPU supply for 2027 still shows some flexibility. According to a statement, Intel's 18A process yield is improving steadily month-on-month, with the potential to achieve mature yield targets by the end of 2026. Overall, we expect the supply-demand gap to last into 2027, potentially leading to further server CPU price increases in 2026. Additionally, benefiting from strong server CPU demand, supporting chips like PCIe retimers, PCIe switches, and memory interface chips are also noteworthy segments.

Long-term Trend: Inference Architecture Shifting from "GPU-attached CPU" to "CPU-strongly-bound Cluster" Over the next decade, data center CPU evolution is unlikely to revert to the traditional logic of simply pursuing higher clock speeds or core count expansion. Instead, it will revolve around three main themes: enhanced data bandwidth capabilities, task specialization, and deeper integration with accelerators. ► CPUs will further evolve towards high-bandwidth data devices. As AI workloads expand from single tensor computations to large-scale context management and state maintenance, we expect the importance of memory channel count, memory bandwidth density, and cache capacity to continue rising. The adoption of LPDDR in data centers, the development of SOCAMM modules, and higher-channel DDR designs all point in one direction: hiding memory latency, increasing bandwidth density, and supporting large contexts are becoming core objectives. We believe future competition will focus not just on per-core performance but on data fabric organization capability and on-chip network bandwidth. ► CPUs will continue to differentiate to match different workloads, developing in three parallel directions: 1) Tightly-coupled CPUs with high single-core performance, high memory bandwidth, and consistent interconnect with AI accelerators; 2) DPU/data-plane type CPUs for KV-cache management, network layering, and data path processing; 3) Cloud-type CPUs with high core density and heavy throughput. This differentiation indicates CPUs are not being replaced by GPUs but are taking on more specialized roles within the AI system. ► The boundary between CPUs and accelerators may further blur. APU architectures (like integrated CPU+GPU designs) may reduce the need for independent head nodes; some RL training loads might migrate to specialized accelerators with local environment execution capabilities; meanwhile, memory pooling and CXL expansion might reduce the traditional binding ratio of requiring a dedicated CPU per rack. From a longer-term perspective, CPUs might even be embedded within switch chips or data center network cores, becoming fundamental control units for data flow scheduling. We believe the future value of CPUs lies not in replacing GPUs, but in managing system complexity. In the AI 2.0 era, improved model capabilities bring more interactions, longer contexts, and more external calls. The CPU, as a general-purpose execution and control unit, will remain the foundational component for maintaining system scalability. Its form may change, but its core position within the compute architecture will not disappear.

Competitive Landscape: x86 vs. Arm, Who Will Prevail? x86 vs. Arm: x86 Leads in Share, Arm Poised to Accelerate Catch-up Currently, Arm holds less than a 20% share in the global server CPU market, which is still dominated by x86 architecture. x86 vs. Arm: x86 maintains a lead in ecosystem maturity, while Arm's share in cloud inference is expected to grow steadily. Agentic products feature high concurrency, continuous operation, and numerous lightweight inference requests (e.g., multi-turn dialogues, tool calls, planning). Arm's RISC-based architecture offers power efficiency advantages, supporting more cores to handle concurrent requests, making it suitable for high-throughput inference serving. x86 retains a leading advantage in ecosystem maturity. Many inference frameworks are more maturely optimized for x86, and some instruction sets offer specialized acceleration for matrix operations. Therefore, we believe for scenarios involving larger model runs, mixed-precision computing, or deep integration with traditional software stacks, x86's compatibility and toolchain advantages are significant. In summary, Arm-architecture CPUs, due to high energy efficiency, are seeing large-scale deployment by CSPs, offering a cost-effective choice for CSPs' own services and clients capable of software optimization for Arm. x86 CPU servers offer a more complete ecosystem and strong compatibility, meaning out-of-the-box generality and minimal migration friction for small and medium enterprises, sustaining broad and stable demand. We expect that as Agentic AI drives an increase in the CPU ratio within AI servers, and with Arm's continued breakthroughs among CSPs and enterprise clients, its share in the global server CPU market could approach half by 2030.

Risk Factors CPU Demand Falling Short of Expectations. CPU demand primarily stems from applications across various scenarios under the Agentic AI trend. If Agentic AI progress is slower than expected, or if the increase in the CPU ratio within server clusters is less than anticipated, or if major cloud providers slow their capital expenditure pace, CPU demand growth may fall short of expectations. Intensifying Market Competition. The server CPU market is primarily divided into x86 and Arm camps. If competition intensifies between x86 and Arm architectures, within the x86 camp between AMD and Intel, or within the Arm camp among various vendors and CSPs' in-house CPU designs, it could trigger market share battles and price pressure. Tight Upstream Production Capacity. The upstream CPU supply chain involves numerous segments. If advanced process nodes and advanced packaging capacity remain tight, wafer supply is constrained, supply of key supporting components like memory is tight, or key equipment supply (e.g., deposition, metrology) is limited, CPU production capacity and market growth could face bottlenecks.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

CPU's Evolving Role in the AI Inference Era

Most Discussed