The current training paradigm assumes that all GPUs must communicate very fast, which is only feasible in a centralized data center setup. This assumption was made in the early 90s and has persisted due to the convenience of having all GPUs in one place.
The bandwidth on the internet is much smaller than the bandwidth between GPUs in a centralized data center, making it difficult to synchronize training across distributed systems.
DisTrO allows GPUs to train independently and only share the most important insights, reducing the need for high-speed interconnects and enabling training over standard internet connections.
DisTrO reduces bandwidth requirements by 857 times compared to traditional methods, making it possible for small teams and individuals to train models using peer-to-peer networks, democratizing AI innovation.
The fear that major open-source AI providers might stop releasing models like Llama 4 prompted the question: 'Is there a way to make Llama 4 ourselves without 20,000 H100s?' This led to the development of DisTrO.
DisTrO requires 857 times less bandwidth and can perform equivalently to traditional methods, making it possible to train models over standard internet connections instead of high-speed interconnects.
DisTrO could enable a global community to train AI models collaboratively, breaking the monopoly of large organizations with massive compute resources and high-speed interconnects.
While DisTrO reduces the need for high-speed interconnects, NVIDIA's CUDA stack and GPU hardware remain essential. The shift could lead to a redesign of chips, focusing more on VRAM and processing power rather than interconnects.
Traditional methods require all GPUs to synchronize after each training step, while DisTrO allows GPUs to train independently and only share key insights, reducing the need for high-speed communication.
The community's willingness to contribute their GPUs and computational power is crucial. DisTrO's success depends on activating this willingness into actual action, enabling decentralized training on a global scale.
In this episode of AI + a16z, Bowen Peng and Jeffrey Quesnelle of Nous Research join a16z General Partner Anjney Midha to discuss their mission to keep open source AI research alive and activate the community of independent builders. The focus is on a recent project called DisTrO, which demonstrates it's possible to train AI models across the public internet much faster than previously thought possible. However, Nous is behind a number of other successful open source AI projects, including the popular Hermes family of "neutral" and guardrail-free language models.
Here's an excerpt of Jeffrey explaining how DisTrO was inspired by the possibility that major open source AI providers could turn their efforts back inward:
"What if we don't get Llama 4? That's like an actual existential threat because the closed providers will continue to get better and we would be dead in the water, in a sense.
"So we asked, 'Is there any real reason we can't make Llama 4 ourselves?' And there is a real reason, which is that we don't have 20,000 H100s. . . . God willing and the creek don't rise, maybe we will one day, but we don't have that right now.
"So we said, 'But what do we have?' We have a giant activated community who's passionate about wanting to do this and would be willing to contribute their GPUs, their power, to it, if only they could . . . but we don't have the ability to activate that willingness into actual action. . . . The only way people are connected is over the internet, and so anything that isn't sharing over the internet is not gonna work.
"And so that was the initial premise: What if we don't get Llama 4? And then, what do we have that we could use to create Llama 4? And, if we can't, what are the technical problems that, if only we slayed that one technical problem, the dam of our community can now flow and actually solve the problem?"
Learn more:
Follow everyone on X:
Check out everything a16z is doing with artificial intelligence here), including articles, projects, and more podcasts.