U.S. Tech Firms Quietly Adopt Chinese AI Models, with Coinbase Leading the Way on GLM and Kimi

American technology companies are quietly integrating Chinese open-source AI models into their production infrastructure. As the cost of leading U.S. model services continues to rise, firms like Coinbase Global, Inc. (COIN) are beginning to adopt Chinese open-source models as the default option, aiming to significantly reduce AI expenses without curbing usage.

Coinbase Global, Inc. CEO Brian Armstrong revealed in a post on platform X last Friday evening that the company has set the recently released GLM 5.2 from Zhipu AI and Kimi 2.7 from Beijing-based Moonshot AI as the default models for engineers via an internal LLM gateway. Armstrong stated that, after combining measures like routing optimization and improved caching, Coinbase Global, Inc.'s AI spending has been cut by "nearly half," while token usage continues to grow at an exponential rate.

The Cost Advantage of Chinese Open-Source Models Comes to the Fore

Armstrong explicitly noted in his post that 91% of engineers never reached the previous usage limits. Therefore, instead of lowering caps or adding spending alerts, Coinbase Global, Inc. opted to switch to "cheaper default models."

GLM 5.2 originates from Zhipu AI, and Kimi 2.7 from Moonshot AI, both being open-source weight models. Armstrong explained that these models are deployed for routine tasks, while engineers can still opt for cutting-edge models for assignments requiring complex reasoning. His rationale is that using top-tier models for execution tasks is often "overkill."

For code review, a multi-model parallel strategy is employed, allowing different models to cross-check outputs to maintain quality standards.

Three-Tier Infrastructure Overhaul Drives Cost Reduction

Armstrong outlined three core methods.

The first is intelligent routing: Within a custom scheduling framework, the system pre-processes prompts and, considering cache hit rates and model pricing, automatically routes tasks to the most suitable and cost-effective model. He stated that the ultimate goal is for AI, not humans, to handle the task of model selection.

The second is aggressive caching: Coinbase Global, Inc. requires all requests to be cache-aware, maximizing the reuse of existing caches. Using LibreChat as an example, after correctly implementing caching, the cache hit rate jumped from 5% to 60%.

The third is context streamlining: Armstrong advises starting new sessions when switching tasks, narrowing file context scope, and disconnecting unused tools. He emphasized that the goal is not to reduce the total number of tokens used, but to reduce "wasted tokens."

Prioritizing Efficiency Over Restriction

Armstrong framed this cost-cutting as a prerequisite for scaling AI adoption, not as a limitation. He noted that engineers remain free to use any number of tokens and any model, but the company has made usage data visible and linked usage to business impact—"the more you spend, the more impact we expect."

He did not disclose specific absolute spending figures. However, structurally, achieving a near-halving of expenses while usage grows exponentially suggests Coinbase Global, Inc. has somewhat decoupled consumption from cost.

Armstrong concluded that this methodology is broadly applicable, and any enterprise can adopt it to achieve sustainable expansion of AI usage without letting cost become a ceiling.

免责声明：投资有风险，本文并非投资建议，以上内容不应被视为任何金融产品的购买或出售要约、建议或邀请，作者或其他用户的任何相关讨论、评论或帖子也不应被视为此类内容。本文仅供一般参考，不考虑您的个人投资目标、财务状况或需求。TTM对信息的准确性和完整性不承担任何责任或保证，投资者应自行研究并在投资前寻求专业建议。

老虎证券

U.S. Tech Firms Quietly Adopt Chinese AI Models, with Coinbase Leading the Way on GLM and Kimi

热议股票