American technology companies are quietly integrating Chinese open-source AI models into their production infrastructure. As the cost of leading U.S. model services continues to rise, firms like Coinbase Global, Inc. (COIN) are beginning to adopt Chinese open-source models as the default option, aiming to significantly reduce AI expenses without curbing usage.
Coinbase Global, Inc. CEO Brian Armstrong revealed in a post on platform X last Friday evening that the company has set the recently released GLM 5.2 from Zhipu AI and Kimi 2.7 from Beijing-based Moonshot AI as the default models for engineers via an internal LLM gateway. Armstrong stated that, after combining measures like routing optimization and improved caching, Coinbase Global, Inc.'s AI spending has been cut by "nearly half," while token usage continues to grow at an exponential rate.
The Cost Advantage of Chinese Open-Source Models Comes to the Fore
Armstrong explicitly noted in his post that 91% of engineers never reached the previous usage limits. Therefore, instead of lowering caps or adding spending alerts, Coinbase Global, Inc. opted to switch to "cheaper default models."
GLM 5.2 originates from Zhipu AI, and Kimi 2.7 from Moonshot AI, both being open-source weight models. Armstrong explained that these models are deployed for routine tasks, while engineers can still opt for cutting-edge models for assignments requiring complex reasoning. His rationale is that using top-tier models for execution tasks is often "overkill."
For code review, a multi-model parallel strategy is employed, allowing different models to cross-check outputs to maintain quality standards.
Three-Tier Infrastructure Overhaul Drives Cost Reduction
Armstrong outlined three core methods.
The first is intelligent routing: Within a custom scheduling framework, the system pre-processes prompts and, considering cache hit rates and model pricing, automatically routes tasks to the most suitable and cost-effective model. He stated that the ultimate goal is for AI, not humans, to handle the task of model selection.
The second is aggressive caching: Coinbase Global, Inc. requires all requests to be cache-aware, maximizing the reuse of existing caches. Using LibreChat as an example, after correctly implementing caching, the cache hit rate jumped from 5% to 60%.
The third is context streamlining: Armstrong advises starting new sessions when switching tasks, narrowing file context scope, and disconnecting unused tools. He emphasized that the goal is not to reduce the total number of tokens used, but to reduce "wasted tokens."
Prioritizing Efficiency Over Restriction
Armstrong framed this cost-cutting as a prerequisite for scaling AI adoption, not as a limitation. He noted that engineers remain free to use any number of tokens and any model, but the company has made usage data visible and linked usage to business impact—"the more you spend, the more impact we expect."
He did not disclose specific absolute spending figures. However, structurally, achieving a near-halving of expenses while usage grows exponentially suggests Coinbase Global, Inc. has somewhat decoupled consumption from cost.
Armstrong concluded that this methodology is broadly applicable, and any enterprise can adopt it to achieve sustainable expansion of AI usage without letting cost become a ceiling.