Hyperscalers are building massive data centers because they believe in winning on scale. They are connecting data centers with high-bandwidth fiber to act as one unit for AI workloads, ensuring they remain competitive in the AI race.
If Google is excluded, over 98% of global AI workloads run on NVIDIA chips. Including Google, the percentage drops to around 70%, as Google runs most of its production AI workloads on its proprietary chips.
NVIDIA's dominance stems from three key areas: superior software (CUDA), better hardware performance, and advanced networking capabilities (via Mellanox acquisition). No other semiconductor company excels in all three areas like NVIDIA.
NVIDIA's vulnerability lies in its inference software, which is less of a moat compared to its training software. If competitors can optimize inference on other hardware, NVIDIA's dominance in inference could be challenged, though its hardware remains superior.
Scaling pre-training is challenging due to diminishing returns as models grow larger and data becomes scarcer. However, synthetic data generation and inference-time compute offer new avenues to improve models without relying solely on pre-training.
Hyperscalers are building larger clusters because they still see value in scaling models, even if pre-training gains are logarithmic rather than linear. Synthetic data generation and inference-time compute require significant compute power, driving the need for larger clusters.
Inference-time reasoning involves models generating multiple possibilities and refining their outputs during inference, which requires more compute than traditional inference. This process can increase token generation by 10x, making it significantly more compute-intensive.
NVIDIA focuses on supply chain optimization, driving new technologies to market faster than competitors. This includes advancements in networking, cooling, and power delivery, ensuring they remain ahead in performance and cost efficiency.
Memory technology, particularly HBM (High Bandwidth Memory), is critical for AI chips as reasoning models require vast amounts of memory to handle large context lengths. NVIDIA's cost of goods sold for HBM is growing faster than its silicon costs, highlighting its importance.
AMD excels in silicon engineering but lacks in software and system-level design. While AMD's GPUs offer more memory and memory bandwidth, they fall short in networking and software capabilities compared to NVIDIA, limiting their overall competitiveness in AI workloads.
Google's TPU has limited commercial success due to internal software restrictions, pricing that doesn't compete well with market alternatives, and Google's preference to use most of its TPU capacity internally for its own workloads.
Broadcom's growth is driven by its custom ASIC wins with companies like Google and Meta, as well as its leadership in networking technology. Broadcom is also well-positioned to compete with NVIDIA's NVSwitch in the networking space, which is a key advantage for NVIDIA.
The risks in 2026 include whether models continue to improve at a rapid pace and if hyperscalers can sustain their spending levels. If models stop improving significantly, there could be a reckoning where hyperscalers cut back on spending, impacting the entire ecosystem.
Open Source bi-weekly convo w/ Bill Gurley and Brad Gerstner on all things tech, markets, investing & capitalism. This week they are joined by Dylan Patel, Founder & Chief Analyst at SemiAnalysis, to discuss origins of SemiAnalysis, Google's AI workload, NVIDIA's competitive edge, the shift to GPUs in data centers, the challenges of scaling AI pre-training, synthetic data generation, hyperscaler capital expenditures, the paradox of building bigger clusters despite claims that pretraining is obsolete, inference-time compute, NVIDIA's comparison to Cisco, evolving memory technology, chip competition, future predictions, & more. Enjoy another episode of BG2!
Timestamps:
(00:00) Intro
(01:50) Dylan Patel Backstory
(02:36) SemiAnalysis Backstory
(04:18) Google's AI Workload
(06:58) NVIDIA's Edge
(10:59) NVIDIA's Incremental Differentiation
(13:12) Potential Vulnerabilities for NVIDIA
(17:18) The Shift to GPUs: What It Means for Data Centers
(22:29) AI Pre-training Scaling Challenges
(29:43) If Pretraining Is Dead, Why Bigger Clusters?
(34:00) Synthetic Data Generation
(36:26) Hyperscaler CapEx
(38:12) Pre-training and Inference-tIme Reasoning
(41:00) Cisco Comparison to NVIDIA
(44:11) Inference-time Compute
(53:18) The Future of AI Models and Market Dynamics
(01:00:58) Evolving Memory Technology
(01:06:46) Chip Competition
(01:07:18) AMD
(01:10:35) Google’s TPU
(01:14:56) Cerebras and Grok
(01:14:51) Amason’s Tranium
(01:17:33) Predictions for 2025 and 2026
Available on Apple, Spotify, www.bg2pod.com)
Follow:
Brad Gerstner @altcap)
Bill Gurley @bgurley)
Dylan Patel @dylan522p)
BG2 Pod @bg2pod)