Gf Securities released a research report stating that the RAG architecture provides long-term memory for large models, with enterprise and personalized demands driving growth in storage requirements for RAG. The storage medium for RAG vector databases in AI inference is transitioning from "memory-assisted retrieval" to an "all-SSD storage architecture," which is expected to continuously increase the demand for high-bandwidth, large-capacity SSDs. It is recommended to focus on the core beneficiaries within the industrial chain.
In the RAG (Retrieval-Augmented Generation) architecture, the LLM (Large Language Model) initiates a query to a vector database before generating a response. The vector database acts as a crucial hub connecting user queries with external knowledge, responsible for efficiently storing, managing, and retrieving high-dimensional vectorized knowledge representations, thereby enhancing the accuracy and timeliness of generated results. From the enterprise perspective, RAG is gradually penetrating online scenarios (e-commerce, web search, etc.) and offline scenarios (corporate, legal, engineering research, etc.). From the individual user perspective, personalized RAG retains users' long-term memories, preferences, and contextual information, forming a "user-level vector space," which significantly boosts the growth of RAG demand.
Vector database storage media need to handle large-scale vector data and index structures, requiring support for high throughput and low latency to meet similarity search demands in high-concurrency scenarios. Currently, the storage medium for vector databases is shifting from "memory-assisted retrieval" towards an "all-SSD storage architecture." According to the paper "All-in-storage ANNS Algorithms Optimize VectorDB Usability within a RAG System," taking KIOXIA AiSAQ as an example, vectors, PQ quantization results, and indexes are uniformly stored on SSDs. For a 10-billion-vector scale, the required SSD capacity is 11.2TB, with PQ Vectors occupying 1.28TB and indexes occupying 10TB. When using TLC/QLC SSDs, AiSAQ offers a 4-7 times cost advantage compared to the DiskANN medium. Furthermore, all AiSAQ tenants are in an active state and can begin queries immediately, eliminating the "cold start" delay associated with loading data from SSD to DRAM before querying, thereby enhancing the RAG system's scalability and economic feasibility.
Volcano Engine's TOS Vectors introduces a new paradigm for vector storage, increasing demands on SSDs. According to the Volcano Engine Developer Community's official account, TOS launched Vector Bucket, an architecture that utilizes ByteDance's self-developed Cloud-Native vector index library Kiwi and a multi-level local cache coordination architecture (covering DRAM, SSD, and remote object storage). In scenarios involving large-scale, long-term storage and low-frequency queries, this architecture not only meets the tiering needs for high/low-frequency data but also significantly lowers the barrier for enterprises to use vector data at scale. TOS Vector deeply integrates with Volcano Engine's high-performance vector database, Volcano AI agent, and other products. In interactive Agent scenarios, frequently accessed memories (such as core user preferences, recent task execution results) are stored in the vector database for millisecond-level high-frequency retrieval; infrequently accessed memories (such as interaction records from six months ago or historical execution results) are stored in TOS Vector, allowing for second-level latency in exchange for lower storage costs and a broader memory space. For Agent scenarios handling complex tasks, TOS Vectors can both host massive semantic vector storage and ensure the sustainable accumulation of long-term data.
Risks include the AI industry's development and demand falling short of expectations; AI server shipments falling short of expectations; and slower-than-expected progress in technology and products from domestic manufacturers.