CoreWeave trains DeepSeek-V3 in 2 minutes, setting AI cloud record

CoreWeave trained the 671-billion-parameter DeepSeek-V3 model in just over two minutes, a result that validates the AI-native cloud provider's full-stack infrastructure strategy.

CoreWeave Inc. trained DeepSeek-V3, a 671-billion-parameter model, in 2.02 minutes on 8,192 NVIDIA GB300 GPUs — the fastest result in the MLPerf Training v6.0 benchmark and the largest GB300 cluster submitted in the round.

"Training DeepSeek-V3 in two minutes on the largest GB300 cluster reflects years of metal-to-model engineering investment," Chen Goldberg, executive vice president of product and engineering at CoreWeave, said.

The company demonstrated near-linear scaling across three cluster sizes: 2.02 minutes on 8,192 GPUs, 3.09 minutes on 4,096 GPUs and 5.54 minutes on 2,048 GPUs. CoreWeave also trained Llama-3.1-405B in 9.77 minutes on 4,096 GB300 GPUs, using 20 percent fewer GPUs than comparable GB200 deployments. On a compact 64-GPU B200 cluster, it trained GPT-OSS-20B in 26.98 minutes and Llama-3.1-8B in 16.54 minutes.

The results, achieved on the same infrastructure available to customers, strengthen CoreWeave's position against hyperscalers in the specialized AI training market. CoreWeave shares trade on Nasdaq under CRWV after its March 2025 listing.

What the MLPerf v6.0 Results Reveal About the AI Training Market

MLPerf Training v6.0, released June 16 by MLCommons, added two new benchmarks — DeepSeek V3 and GPT-OSS 20B — both built on Mixture-of-Experts architecture, which activates only a fraction of a model's total parameters per token. DeepSeek V3 uses 671 billion total parameters with 37 billion activated per token, making it the largest benchmark in the suite's history. GPT-OSS 20B, with 21 billion total parameters and 3.6 billion activated, was designed as an entry point for organizations with smaller hardware configurations.

The round drew 24 submitting organizations across 95 unique systems, using 13 different hardware accelerators and 19 host processors. Cloud system submissions more than doubled compared with version 5.1 six months ago, reflecting the growing market for hosted AI training. Sixty percent of submitted systems were multi-node.

"The gap between benchmark performance and production reality remains one of the most persistent challenges in AI infrastructure," Brendan Burke, research director at Futurum Research, said. "CoreWeave's MLPerf Training v6.0 results, particularly training DeepSeek-V3 in two minutes on the largest GB300 cluster in the benchmark, demonstrate that full stack AI expertise compounds real-world performance gains as new hardware arrives."

How CoreWeave's Infrastructure Stack Drove the Results

CoreWeave attributed its performance to optimizations across every layer of its platform. CoreWeave Mission Control performs continuous health checks on rack-scale systems, validating hardware, firmware, network and thermal conditions before and during large-scale training jobs to reduce stragglers. The company's SUNK scheduler is topology-aware, co-locating expert-parallel groups within the same NVL72 domain to minimize inter-rack communication for MoE workloads. A rail-aware networking strategy balances traffic across the fabric to prevent hotspots at multi-thousand-GPU scale.

The runs used NVIDIA NeMo Framework Release 26.04 with CUDA graphs and Tensor, pipeline and context-parallel sharding tailored to the GB300 NVL72 topology, plus NVIDIA Spectrum-X Ethernet running RoCE for scale-out fabric.

CoreWeave was the only submitter to scale a GB300 platform beyond 2,048 GPUs on DeepSeek-V3. The company is also the only AI cloud to earn the top Platinum ranking in both SemiAnalysis ClusterMAX 1.0 and 2.0.

What This Means for the AI Cloud Competitive Landscape

CoreWeave's benchmark results arrive as demand for AI training infrastructure accelerates. Sharon AI (SHAZ) surged about 25 percent on Friday after announcing a six-year strategic compute collaboration with NVIDIA that could include up to 40,000 GB300 GPUs across 72 megawatts of new data-center capacity in Australia. The deal expands Sharon AI's total AI factory footprint to 132 megawatts.

For CoreWeave, the MLPerf results provide independent validation of its platform at a time when enterprises are evaluating cloud providers for large-scale AI workloads. The company's ability to deliver near-linear scaling on the most demanding MoE models — while using the same infrastructure it offers customers — creates a measurable differentiator against Amazon Web Services, Microsoft Azure and Google Cloud, which also submitted results in the v6.0 round.

CoreWeave's stock, which listed in March 2025, has been a proxy for the AI infrastructure buildout. The MLPerf results give investors a concrete benchmark for evaluating whether the company's full-stack approach translates into a sustainable competitive advantage as the AI training market shifts toward sparse computation architectures.

This article is for informational purposes only and does not constitute investment advice.