Chinese AI models gain routing share as costs undercut US by 95%

Chinese AI models are capturing a growing share of inference routing traffic as their API costs run at a fraction of US competitors, reshaping the economics of the AI market.

Chinese AI models from DeepSeek, Alibaba's Qwen, and ByteDance are winning an increasing share of model routing queries as developers redirect non-sensitive workloads to the cheapest inference provider, threatening the pricing power of OpenAI, Anthropic, and Google.

The trend was highlighted in a June 8 CNBC report by Deirdre Bosa, which examined how model routing platforms are increasingly directing traffic to Chinese providers as the cost gap widens across the industry.

DeepSeek's API pricing for its V3 model runs at roughly $0.14 per million input tokens, compared with OpenAI's GPT-4o at $2.50 — a 94% discount. Alibaba's Qwen 2.5 and ByteDance's Doubao models offer similar pricing advantages, making them the default choice for routing platforms that optimize for cost over raw capability.

The shift threatens the revenue models of US AI leaders that have built their businesses on premium API pricing. OpenAI alone is expected to generate more than $10 billion in revenue this year, much of it from API access. If routing platforms continue to divert traffic to Chinese providers, US companies may be forced to cut prices, compressing margins across the industry.

How Model Routing Reshapes the Inference Market

Model routing platforms such as OpenRouter and Together AI automatically evaluate incoming queries and direct them to the model offering the best balance of capability and cost. For tasks like summarization, translation, and basic code generation — which account for the majority of inference volume — Chinese models often deliver comparable quality at a fraction of the price. This creates a structural advantage for Chinese providers that US labs cannot easily counter without slashing their own prices.

Who Wins, Who Loses

The biggest beneficiaries are cloud infrastructure providers that support multi-model routing, including AWS, Google Cloud, and Alibaba Cloud, which earn compute revenue regardless of which model wins the routing decision. Nvidia also benefits from increased total compute demand — every inference query still requires GPU cycles, and routing platforms drive higher overall utilization.

The biggest losers are US AI labs that have invested billions in training frontier models but now face a pricing war they may struggle to win. OpenAI has raised more than $20 billion in funding, much of it spent on training compute and talent. If routing platforms commoditize inference, the economics that justified those investments begin to break down.

For investors, the key question is whether US AI companies can maintain their pricing power. OpenAI, Anthropic, and Google's DeepMind have relied on premium API pricing to fund massive training runs. DeepSeek's V3 model was trained for roughly $6 million in compute costs, compared with the hundreds of millions spent on comparable US models — a cost structure that allows Chinese providers to undercut US pricing indefinitely. Morgan Stanley analysts have flagged inference pricing as a key risk for AI infrastructure valuations, noting that a sustained price war could reduce projected returns on the $200 billion in AI data center CapEx planned through 2027.

This article is for informational purposes only and does not constitute investment advice.