U.S. Tech Firms Quietly Adopt Chinese AI Models, with Coinbase Leading the Way on GLM and Kimi

American technology companies are quietly integrating Chinese open-source AI models into their production infrastructure. As the cost of leading U.S. model services continues to rise, firms like Coinbase Global, Inc. (COIN) are beginning to adopt Chinese open-source models as the default option, aiming to significantly reduce AI expenses without curbing usage.

Coinbase Global, Inc. CEO Brian Armstrong revealed in a post on platform X last Friday evening that the company has set the recently released GLM 5.2 from Zhipu AI and Kimi 2.7 from Beijing-based Moonshot AI as the default models for engineers via an internal LLM gateway. Armstrong stated that, after combining measures like routing optimization and improved caching, Coinbase Global, Inc.'s AI spending has been cut by "nearly half," while token usage continues to grow at an exponential rate.

The Cost Advantage of Chinese Open-Source Models Comes to the Fore

Armstrong explicitly noted in his post that 91% of engineers never reached the previous usage limits. Therefore, instead of lowering caps or adding spending alerts, Coinbase Global, Inc. opted to switch to "cheaper default models."

GLM 5.2 originates from Zhipu AI, and Kimi 2.7 from Moonshot AI, both being open-source weight models. Armstrong explained that these models are deployed for routine tasks, while engineers can still opt for cutting-edge models for assignments requiring complex reasoning. His rationale is that using top-tier models for execution tasks is often "overkill."

For code review, a multi-model parallel strategy is employed, allowing different models to cross-check outputs to maintain quality standards.

Three-Tier Infrastructure Overhaul Drives Cost Reduction

Armstrong outlined three core methods.

The first is intelligent routing: Within a custom scheduling framework, the system pre-processes prompts and, considering cache hit rates and model pricing, automatically routes tasks to the most suitable and cost-effective model. He stated that the ultimate goal is for AI, not humans, to handle the task of model selection.

The second is aggressive caching: Coinbase Global, Inc. requires all requests to be cache-aware, maximizing the reuse of existing caches. Using LibreChat as an example, after correctly implementing caching, the cache hit rate jumped from 5% to 60%.

The third is context streamlining: Armstrong advises starting new sessions when switching tasks, narrowing file context scope, and disconnecting unused tools. He emphasized that the goal is not to reduce the total number of tokens used, but to reduce "wasted tokens."

Prioritizing Efficiency Over Restriction

Armstrong framed this cost-cutting as a prerequisite for scaling AI adoption, not as a limitation. He noted that engineers remain free to use any number of tokens and any model, but the company has made usage data visible and linked usage to business impact—"the more you spend, the more impact we expect."

He did not disclose specific absolute spending figures. However, structurally, achieving a near-halving of expenses while usage grows exponentially suggests Coinbase Global, Inc. has somewhat decoupled consumption from cost.

Armstrong concluded that this methodology is broadly applicable, and any enterprise can adopt it to achieve sustainable expansion of AI usage without letting cost become a ceiling.

免責聲明：投資有風險，本文並非投資建議，以上內容不應被視為任何金融產品的購買或出售要約、建議或邀請，作者或其他用戶的任何相關討論、評論或帖子也不應被視為此類內容。本文僅供一般參考，不考慮您的個人投資目標、財務狀況或需求。TTM對信息的準確性和完整性不承擔任何責任或保證，投資者應自行研究並在投資前尋求專業建議。

老虎證券

U.S. Tech Firms Quietly Adopt Chinese AI Models, with Coinbase Leading the Way on GLM and Kimi

熱議股票