GTHT: Breaking Memory Wall Constraints, AI SSDs Usher in Vast Growth Potential

GTHT Securities released a research report stating that SSD-based storage offloading solutions can provide a new pathway for the efficient operation of AI models amid the "memory wall" challenges faced in large language model (LLM) development. The massive data volumes generated by AI are straining global data center storage infrastructure, with KV Cache offloading enabling data migration from GPU memory to CPU and SSD.

Traditionally, Nearline HDDs have served as the cornerstone for mass data storage, but supply shortages are now driving high-performance, high-cost SSDs into the spotlight. GTHT Securities has assigned an "Overweight" rating to the electronics sector. Key insights from the report include:

**Industry Outlook & Investment Recommendation** The explosive growth of AI-generated data is overwhelming data center storage systems. KV Cache offloading allows data to shift from GPU memory to CPU and SSD, alleviating bottlenecks. With Nearline HDDs facing supply constraints, high-efficiency SSDs are gaining traction, warranting an "Overweight" rating.

**KV Cache Capacity Outpaces HBM Limits** KV Cache technology optimizes computational efficiency by temporarily storing generated token keys and values, reducing redundant calculations and significantly improving inference performance. However, KV Cache occupies GPU memory (e.g., HBM), and as text sequences grow longer, cache data expands, risking HBM and DRAM overload.

With the rise of Agentic AI, model scaling, surging long-sequence demands, and concurrent inference tasks are pushing KV Cache capacity beyond HBM limits. Frequent memory overflows force GPUs to recompute, causing latency.

**Offloading KV Cache to CPU & SSD** As inference performance gains importance, the industry is exploring tiered KV Cache management. NVIDIA’s Dynamo framework, launched in May, supports offloading KV Cache from GPU memory to CPU, SSD, or network storage, mitigating memory bottlenecks. KVBM enables offloading across GPU memory, CPU host memory, SSD, and remote storage (G1-G4), minimizing recomputation.

At the 2025 Open Data Center Conference, Samsung’s senior project manager proposed an SSD-based offloading solution to address LLM memory constraints. By migrating KV Cache to NVMe SSDs, the approach reduces first-token latency (TTFT) by up to 66% and inter-token latency (ITL) by 42%, while supporting multi-user, multi-round dialogue reuse. I/O throughput scales steadily, primarily via 256KB read/write operations.

**AI Storage Demand Spurs HDD Replacement** TrendForce reports that AI inference is accelerating demand for real-time access and high-speed processing of massive datasets, prompting HDD and SSD suppliers to ramp up high-capacity storage production. With HDD supply gaps widening, NAND Flash manufacturers are fast-tracking 122TB and 245TB Nearline SSD development.

**Risks**: Slower-than-expected domestic substitution; delayed technology iteration.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Tiger Brokers

GTHT: Breaking Memory Wall Constraints, AI SSDs Usher in Vast Growth Potential

Most Discussed