GF Securities: AI Memory's Upstream Infrastructure Sees Rising Value and Importance, Recommends Focus on Core Beneficiaries in the Supply Chain

Stock News
Feb 03

GF Securities released a research report stating that this is AI's "Memory moment," where AI memory is becoming the underlying capability supporting contextual continuity, personalization, and the reuse of historical information. It is continuously expanding the boundaries of model capabilities and is expected to accelerate the implementation of applications like AI Agents. The value of AI memory is shifting from a "cost item" to an "asset item," and the value and importance of related upstream infrastructure will continue to increase. It is recommended to focus on the core beneficiaries within the industrial chain.

NVIDIA has launched the ICMS AI inference context storage platform. As the KV Cache accumulates with users' multi-turn conversations and the continuous operation of Agents, the system develops a rigid demand for a hierarchical KV Cache that can be retained long-term and backfilled on-demand. This drives the context to spill over from HBM to hierarchical mediums like DRAM and SSD for承接 (undertaking). To address this, NVIDIA introduced the ICMS context memory storage architecture, which provides a "long-term context memory layer" for Agent and multi-turn inference scenarios. On one hand, it carries a larger scale KV Cache; on the other hand, it backfills historical KV Cache into multi-GPU node multi-round inference sessions with low latency. Its KV access pattern exhibits high concurrency and high-throughput random reads under low Time-To-First-Token constraints.

The ICMS platform demonstrates effective use of SSDs with good economics and scalability. In terms of economics and scalability, the unit cost of SSDs is significantly lower than GPU memory, and they can be scaled by TB and PB capacities, making them a natural medium for carrying long-term context. Regarding feasibility, according to the report "Context Memory Storage Systems, Disruption of Agentic AI Tokenomics, and Memory Pooling Flash vs DRAM," after introducing a petabyte-scale storage layer, ICMS's access latency is only slightly higher than pooled DRAM. Empirically, WEKA conducted a performance evaluation of its Augmented Memory Grid (AMG), a context storage solution compatible with NVIDIA's ICMS. The test simulated a continuously expanding user pool during the decode phase: (1) When the initial user pool was small, the KV Cache primarily resided in GPU HBM, and the token throughput of all three solutions (HBM+WEKA AMG, HBM+DRAM, HBM+DRAM+POSIX file system) could be maintained at a high level. (2) As the number of users continued to grow, the KV Cache spilled over to lower-level memory/storage, and token throughput began to decline. However, WEKA AMG,凭借 (leveraging) higher capacity and stronger network and concurrent random access capabilities, could complete context prefetching and backfilling faster, reducing cold starts and blocking, thereby maintaining higher and more stable token throughput during the large user pool phase. Compared to the HBM+DRAM and HBM+DRAM+POSIX solutions, its throughput improvement could reach up to 4 times, validating that ICMS can effectively承接 (undertake) long-term context and maintain throughput stability.

The ICMS platform opens up market space for storage. Referencing Vast Data, the report estimates the size of the context storage space. (1) Storage space required per token: Assuming 100,000 concurrent online users or Agents use Llama 3.1 405B, the storage space per token is 504 KB/token. (2) Storage space required per user context window: If each context window is 64,000 tokens, the corresponding storage is approximately 30 GB. (3) Retention multiplier: For a better user experience, assume a retention multiplier of 15×. Under these assumptions, the total storage demand for 100,000 users is approximately 45 PB. That is, to stably support 100,000 users/Agents on a large-context model with strong dialogue history capabilities, the required context storage scale can reach the petabyte level.

Risks include the AI industry development and demand falling short of expectations; AI server shipments falling short of expectations; and domestic manufacturers' technology and product progress falling short of expectations.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Most Discussed

  1. 1
     
     
     
     
  2. 2
     
     
     
     
  3. 3
     
     
     
     
  4. 4
     
     
     
     
  5. 5
     
     
     
     
  6. 6
     
     
     
     
  7. 7
     
     
     
     
  8. 8
     
     
     
     
  9. 9
     
     
     
     
  10. 10