SemiAnalysis8/20/2025

H100 vs GB200 NVL72 Training Benchmarks – Power, TCO, and Reliability Analysis, Software Improvement Over Time

Dylan Patel
H100 vs GB200 NVL72 Training Benchmarks - Power, TCO, and Reliability Analysis, Software Improvement Over Time Joules per Token, TCO Per Million Tokens, MFU, Tokens Per US Annual Household Energy Usage, DeepSeek 670B, GB200 Unreliability, Backplane Downtime Frontier model training has pushed GPUs and AI systems to their absolute limits, making cost, efficiency, power, performance per TCO, and reliability central to the discussion on effective training. The Hopper vs Blackwell comparisons are...