NVIDIA Diversifies Product Lines to Address Both AI Training and Inference Demands as CSPs Expand Custom ASIC Development

According to the latest AI server research from TrendForce, as major cloud service providers (CSPs) intensify their efforts in developing custom chips, NVIDIA has shifted its focus at the GTC 2026 event towards the implementation of AI inference applications across various sectors. This marks a departure from its previous concentration on the cloud AI training market. By promoting a diverse product portfolio including GPUs, CPUs, and LPUs to separately address AI training and inference demands, and by driving supply chain growth through integrated rack solutions, NVIDIA is adapting its strategy. TrendForce indicates that with the expansion of custom chip initiatives led by CSPs such as Google and Amazon, the proportion of ASIC AI servers in total AI server shipments is projected to increase from 27.8% in 2026 to nearly 40% by 2030.

To reinforce its leadership in the AI market, NVIDIA is actively promoting integrated rack-scale solutions like the GB300 and VR200, which combine CPUs and GPUs, emphasizing their scalability for AI inference applications. The Vera Rubin system introduced at GTC is defined as a highly vertically integrated complete system, encompassing seven chip types and five rack configurations. Regarding the Vera Rubin supply chain timeline, memory manufacturers are expected to supply HBM4 for the Rubin GPU by the second quarter of 2026, supporting NVIDIA's planned shipment of Rubin chips around the third quarter.

For the GB300 and VR200 rack systems, the GB300 has already replaced the GB200 as the primary product in the fourth quarter of 2025, with its shipment share estimated to reach nearly 80% by 2026. The VR200 rack is anticipated to gradually ramp up shipment volumes towards the end of the third quarter of 2026, though subsequent developments will depend on actual progress by ODMs.

Furthermore, as AI transitions from generative to agent-based models, the decoding phase for token generation faces significant latency and memory bandwidth bottlenecks. In response, NVIDIA, integrating technology from the Groq team, has introduced the Groq 3 LPU, specifically designed for low-latency inference. A single Groq 3 chip incorporates 500MB of SRAM, with a full rack supporting up to 128GB. However, the LPU's inherent memory capacity is insufficient to handle the massive parameters and KV cache required for systems like Vera Rubin. Consequently, NVIDIA proposed a "Disaggregated Inference" architecture at GTC. Utilizing an AI factory operating system called Dynamo, this architecture splits the inference pipeline into two parts: for agent-based AI, the Pre-fill and Attention computation stages, which involve heavy mathematical operations and store large KV caches, are handled by the high-throughput, large-memory Vera Rubin system. The decoding and token generation stages, which are bandwidth-constrained and highly sensitive to latency, are offloaded to LPU racks expanded with massive memory.

On the supply chain front, the third-generation Groq LP30, manufactured by Samsung, has entered full mass production and is scheduled for official shipment in the second half of 2026. Future plans include the launch of the higher-performance LP40 chip in the next-generation Feynman architecture.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

NVIDIA Diversifies Product Lines to Address Both AI Training and Inference Demands as CSPs Expand Custom ASIC Development

Most Discussed