We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Neural Nets and Nobel Prizes: AI's 40-Year Journey from the Lab to Ubiquity

2024/10/25

AI + a16z

AI Deep Dive AI Insights AI Chapters Transcript

People

Anjney Midha

Derrick Harris

Topics

Anjney Midha: 20世纪80年代神经网络的早期研究，特别是Hinton、LeCun和Schmidhuber的工作，为深度学习革命奠定了基础。尽管经历了所谓的“AI寒冬”（作者更倾向于称之为“AI秋季”），但这一时期仍然涌现出许多重要的基础性贡献，例如卷积神经网络（CNN）、长短期记忆网络（LSTM）以及分层预训练技术等。GPU的出现极大地加速了神经网络计算，为深度学习的突破提供了关键的硬件支持。AlexNet的成功则标志着深度学习在GPU上的强大功能，以及通用技术（如矩阵运算）在AI发展中的重要性。从AlexNet到ChatGPT，AI的发展并非一蹴而就，而是模型规模、数据和技术的稳步改进的结果。迁移学习、GAN、注意力机制和Transformer以及规模定律等技术的进步，共同推动了从图像分类到生成式AI的飞跃。当前AI发展正朝着多模态方向发展，数据收集和处理比模型架构更重要。在学术界和工业界AI研究方面，作者认为，虽然大型实验室的突破性研究仍然很重要，但开源和个人贡献在AI发展中变得越来越重要。开源模型的普及降低了AI发展的门槛，为个人和小型团队提供了更多机会。独立研究人员不受大型实验室激励机制的限制，能够利用现有资源取得重大突破。大学和非商业机构在AI研究中仍然扮演着重要角色，需要加强其与工业界的合作，以弥合资源差距。为了促进大学在AI研究中的贡献，可以采取一些措施，例如：提供访问计算和数据资源的机会，利用开源模型，解决数据工程方面的挑战。 Derrick Harris: AI研究人员获得诺贝尔奖，标志着AI的重要性日益增长，并正在融入其他科学领域。

Deep Dive

Key Insights

Why did the AI winter of the 1990s to 2000s not completely halt progress in AI?

The AI winter was more of an 'AI autumn,' where expectations fell but important foundational work continued. Researchers like Hinton, LeCun, and Schmidhuber made gradual progress, laying the groundwork for later breakthroughs in deep learning.

What role did GPUs play in the development of modern AI?

GPUs, originally designed for gaming, became crucial for accelerating the matrix math required by neural networks. This enabled the training of deeper and more complex models, which was a key factor in the deep learning revolution.

Why is the democratization of AI important for innovation?

The democratization of AI allows independent researchers and small teams to apply powerful models to novel domains, leading to diverse and innovative applications. This approach bypasses the incentive structures of traditional academia and large labs, fostering creativity.

How did AlexNet contribute to the AI boom of the 2010s?

AlexNet demonstrated the power of deep learning on GPUs, significantly outperforming previous methods like SVMs. It marked a tipping point in computer vision and showed that deep neural networks could be effectively trained on large datasets, sparking widespread interest in neural networks.

What is the significance of the Nobel Prizes awarded to AI researchers in 2024?

The awards signal AI's growing impact across scientific disciplines, marking a 'crossing the chasm' moment where AI moves from niche technology to mainstream scientific tooling. It validates AI's role as a meta-discipline that benefits other fields.

What are Boltzmann machines and why were they important?

Boltzmann machines are a type of neural network developed in the 1980s that use probabilistic rules inspired by statistical physics. They were crucial for learning complex probability distributions and finding hidden patterns in data, paving the way for modern deep learning techniques.

How does the current state of AI research compare to the 1990s?

In the 1990s, AI research was often seen as stagnant, with limited practical success in neural networks. However, in hindsight, this period was marked by foundational work that set the stage for the deep learning breakthroughs of the 2010s.

What is the 'bitter lesson' in AI and how does it relate to current trends?

The 'bitter lesson' suggests that general-purpose techniques like search and learning tend to outperform domain-specific, hand-engineered methods. This principle is reflected in the increasing reliance on computation and data scaling in modern AI research.

How do transformers differ from earlier AI models like Boltzmann machines?

Transformers introduced the attention mechanism, allowing models to focus on relevant parts of input data and capture long-range dependencies. This was a significant leap from earlier models like Boltzmann machines, which were more limited in their ability to process complex data.

What challenges do universities face in contributing to AI research?

Universities often lack access to the compute resources and data engineering expertise needed for large-scale AI research. Bridging this gap requires better collaboration between academia and industry, as well as open-source tools that allow researchers to focus on domain-specific applications.

Chapters

This chapter discusses the awarding of Nobel Prizes in Physics and Chemistry to AI researchers, exploring the significance of this event for the field of AI and its implications for other scientific disciplines. It also introduces the concept of AI as a meta-discipline and its potential to increase efficiency in the scientific method.

Nobel Prizes awarded to AI researchers in Physics and Chemistry.
AI's growing impact and integration into other research areas.
AI as a meta-discipline, impacting various fields.
AI's transition from niche technology to mainstream scientific tooling.

Shownotes Transcript

Translations:

中文

I think the period between the initial excitement around neural networks in the 1980s and the deep learning breakthroughs in 2012 were marked by a bunch of really important foundational contributions from folks like, of course, Hinton, but also Jan LeCun and Schmidhuber, because they continue to work on sort of neural network approaches during this time, making gradual progress. And I think a few of the milestones that come to mind for me was one, of course, the CNN moment where Jan LeCun and others in the 90s

came up with these convolutional neural networks that proved incredibly effective for image recognition tasks. And then shortly after that, Schmidhuber and Hochreiter did this really important work around long short-term memory networks, LSTMs. And then building on that, Hinton and a few others developed these techniques for pre-training deep networks layer by layer, which was the foundation for unsupervised pre-training. The most notable benefit of those techniques was to help overcome

the difficulties in training very deep networks. And then of course, the star of the show became GPUs that could accelerate all the matrix math that these neural network computations need. And so while I think no individual technique during that AI winter, or as I'd prefer to call it, the AI autumn, led to widespread adoption of neural networks immediately, they kind of set the stage for the entire deep learning revolution that we're in the grips of right now.

Welcome once again to the A16Z AI Podcast. I'm Derek Harris and I'm joined once again by A16Z General Partner Anjane Mitha to dive into an interesting artificial intelligence topic. In this case, it's the spate of Nobel Prizes, five of them to be exact, in the fields of physics and chemistry awarded to AI researchers.

Because it's a great jumping off point to explain how we arrived at our current state, we focus much more on the physics prize, which was awarded to John Hopfield and Jeff Hinton for early work on artificial neural networks dating back more than 40 years. Although Hinton, in particular, was also an instrumental figure in the deep learning movement of the early 2010s, which provided a direct line to today's foundation models and mass adoption of AI tools.

We discussed the connections between neural nets, computer science, and physics, why the last AI winter was more of an AI autumn that actually laid the groundwork for some huge advances, and how we might see other fields and scientific disciplines adopt the factory-like approach to building AI models that has proven remarkably effective for AI labs.

We end with a discussion of how to rejuvenate leading-edge AI research inside universities and the increasingly important role of independent builders, teams, and open-source creators in driving important systems-level advances in AI and in software in general. Enjoy.

As a reminder, please note that the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. For more details, please see a16z.com slash disclosures.

So Jeff Hinton and John Hopfield won the Nobel Prize for physics and relatedly, Demis Hassabis and John Jumper from Google DeepMind won the prize for chemistry this year from some of their work on AlphaFold. So at a high level, Anj, what does it mean for the field of AI to see five researchers win the Nobel Prize in a single year? And at the risk of using a locally sourced cliche, is this confirmation that AI and computer science maybe overall is eating other scientific fields?

One school of thought would be that this represents a watershed moment for AI. It's hugely validating for AI's importance across different scientific disciplines, that it signals this sort of AI's growing impact and integration into a bunch of other fundamental research areas. And I think the contrarian view or the opposing view to that is that these awards were surprising.

because they sort of dilute the meaning of field-specific prizes and may reflect hype more than scientific merit. And maybe the Nobel Committee was jumping on the AI bandwagon, and actually they potentially overshadow much more important work in traditional physics and chemistry. I'm more sympathetic towards the former view, which is that I do think this represents a sort of crossing the chasm moment for AI.

It signals how AI is moving from niche technology to mainstream scientific tooling. And I find the explore versus exploit framework relevant here, where I think basically we're seeing the fruits of sort of decades of exploration in AI.

now being exploited across multiple scientific domains. And so I think it's a huge win, both for science and for AI in the following sense, artificial intelligence and computer science in many ways is a meta-discipline. It's the study of general purpose computational methods that benefit specific fields and applications. I think it's very exciting that

we are recognizing the value of a meta-science like computer science and artificial intelligence to a bunch of other disciplines. So my hope is it ignites a lot more adoption of these tools in those fundamental science domains. And I think that we all win if it results in a lot more efficiency in the scientific method. For every come down on that, I think that Nobel News is also a great catalyst to do a retrospective on maybe the past 40 years or so of AI. Because if we start with Kinton and Hopfield,

They won for early work on artificial neural networks dating back to the early 80s. So can you explain to listeners who are unfamiliar with a Boltzmann machine just to start, like what that is and why that work was so important?

Sure. You know, Boltzmann machine, like you said, was developed by Jeffrey Hinton and some of his colleagues in the 80s. And it's a type of artificial neural network that's based quite heavily on concepts from statistical physics, namely the Boltzmann distribution. It has both visible nodes for input and output processing, and it's got this idea of a hidden node for an internal representation within the network. And two, it's a generative model. So it can learn and produce new patterns similar to its training data.

And three, it uses probabilistic rules for updating the node states of the network, inspired by the behavior of particles in statistical physics. And I think the core importance of the Boltzmann machine lies in its ability to learn complex probability distributions and find these hidden patterns in data without explicit programming.

And that was a pretty significant step towards more powerful and flexible machine learning models. And Hinton and his colleagues developed this learning algorithm for Boltzmann machines that while was elegant, it was pretty computationally intensive. And so later they followed that up with a more efficient version, like

like the restricted Boltzmann machine, RBM, which then became an important building block for modern deep learning. And I think there's again sort of two views on the value of the Boltzmann machine, right? One is that they were a crucial step in neural network development and they allowed for unsupervised learning because they could handle much more complex patterns than previous models.

And the somewhat other side of the house would say, well, look, these Boltzmann machines were largely superseded by other techniques and their direct impact is pretty overstated. And the award to Hinton in particular might be more for his overall contributions than this specific work. I'm more sympathetic to the former in that I do think Boltzmann machines did represent a key standing on the shoulders of giants moment in AI history. They're definitely not widely used today,

They're still studied because they were a critical stepping stone in the field's development. And I think they helped researchers really understand the power of neural networks for learning and for generating data. They paved the way for later advancements in deep learning. I definitely consider that work sort of foundational to the development of modern AI techniques that are now used in various applications, right? Like image recognition and natural language processing and drug discovery.

I think we should view them as a very meaningful historical development, but they are definitely not widely used today. And I think modern AI has an almost parallel development track where I don't think it's controversial to treat them with the respect that they deserve.

in the historical development of the understanding of the bitter lesson, so to speak. But using Boltzmann machines and Hinton's more recent work, conflating those two things, I think, is probably overextending their importance. And you mentioned Hinton's later works because, you know, it was almost 30 years probably between the Boltzmann machine and then 2012 with AlexNet, which also came out of Jeff Hinton's lab, this time at the University of Toronto. Can you kind of walk through the work that was happening during this time frame between, say, Hinton's Nobel Prize work and deep learning?

I think it's a fascinating question because there's definitely the status quo view from the period. Like if you look at the literature of that era, it often referred to itself for sure as the 1990s to the 2000s continued as an AI winter. The view would be that the field saw continued research

but very limited practical success in neural networks. And the focus shifted to other AI techniques like support vector machines and decision trees. In fact, when I got to grad school and was doing my machine learning coursework, I think that was still largely the view. But I think now in hindsight, if I was updating those priors today, I would actually say that that AI winter is overstated as a winter. And there was actually a ton of important foundational work

that did continue during that period. And so, the narrative of a winter and spring sort of oversimplifies the progress that was made. And I lean towards describing that period more as an AI autumn because I think it represented sort of this valley of despair where expectations fell but important work still continued. And so, the period between the initial excitement around neural networks in the 1980s and the deep learning breakthroughs in 2012 were marked by, to me, a bunch of really

important foundational contributions from folks like, of course, Hinton, but also Jan LeCun and Schmidhuber, because they continue to work on sort of neural network approaches during this time, making gradual progress. And I think a few of the milestones that come to mind for me was one, of course, the CNN moment, right, where Jan LeCun and others in the 90s came up with these convolutional neural networks that proved incredibly effective for image recognition tasks. And then shortly after that, Schmidhuber and I'm

butchering his name, but it's Hawk Rider, did this really important work around long, short-term memory networks, LSTMs. And those sort of addressed this vanishing gradient problem in recurrent neural networks where networks became much better suited for sequential data processing as opposed to discrete. And then I think building on that, Hinton and a few others developed these techniques for pre-training deep networks layer by layer, which was the foundation for unsupervised pre-training.

The most notable benefit of those techniques was to help overcome the difficulties in training very deep networks, right? These were networks with several layers as opposed to much more primitive single neuron networks or single layer networks. That was enough of a body of work in sort of fundamental network architecture that then a number of folks in that sort of 2000s era did optimization algorithm work and regularization techniques to make training neural networks much more reliable, much more stable.

The star of the show became GPUs that could accelerate all the matrix math that these neural network computations need, starting with 2007, 2008 era of NVIDIA GPUs that were originally sort of gaming cards.

But it turns out we're fantastic at helping a lot of the Matmul stuff go really, really fast. No individual technique during that AI winter, or as I prefer to call it, the AI autumn, led to widespread adoption of neural networks immediately. They set the stage for the entire deep learning revolution that we're in the grips of right now.

What was it about AlexNet? Was it really just a culmination of, like you said, this decade of work, basically, and then realizing, oh yeah, GPUs actually help us, like you said, speed this up and make it a feasible thing to do? You know, I think the commonly held view would be that AlexNet demonstrated the power of deep learning on GPUs.

Those techniques vastly outperformed the previous techniques like SVMs, like we were talking about. And it marked the beginning of the current AI boom, especially in computer vision, because up until then, the most popular objection towards neural networks were they were just incredibly inefficient.

And I think what they got to draft off of was Moore's law. And I think what that demonstrated was this idea of the bitter lesson, which is we tend to overvalue efficiency over generalizability early on in a field. And actually being able to take advantage of general purpose techniques like matrix math.

is generally a pretty good bet. And that efficiency, if anything, is a normative judgment at a moment in time. And if we're going to critique a set of computational techniques as being inefficient, you have to always caveat that with a time duration and say, well, too efficient for now. The opposing view would be that AlexNet's importance is pretty overstated. It was mostly an incremental improvement on existing techniques. There were a number of other proximal neural networks at the time. And that the focus on this single paper sort of overshadows other important concurrent

work that was happening at the time. I think that's a bit unfair. I do think it was a tipping moment. And I think there are these seminal moments of research work that lead to improvements in the qualitative shift in the field's direction. And I think AlexNet really helped accomplish that. From a performance standpoint, AlexNet dramatically outperformed other methods on the ImageNet classification task benchmark. I think it reduced the error rate

by double digits north of 20%. From a scale perspective, it demonstrated that these deep neural networks with many layers, I think the original implementation had eight layers and millions of parameters could be effectively trained on pretty large datasets. I think it showed that you could leverage GPU acceleration to train a much larger network than was previously practical, showing how advances in hardware could then enable even more powerful AI models from a general application

perspective, the success of AlexNet on a pretty challenging real-world task like image classification meant that similar approaches could then work for other complex problems too, like self-driving cars and so on. From an algorithmic perspective on feature learning, AlexNet sort of learned these hierarchical features directly from data, reducing the need for hand-engineered features that were much more common in previous computer vision techniques. And so I think it really helped ignite interest in neural networks and kickstarted the current deep learning revolution.

And certainly personally for me, yeah, it was pretty exciting to be a machine learning researcher at the time doing my graduate work. It felt like we were in a new renaissance, right? Whereas just a few years before that, it was a pretty sleepy, cynical field of AI research. So people who follow the field and are in the field probably track the consistent progress of AI over that decade or so. But like chat GPT seemed like another big inflection point in late 2022. And it seemed like the idea of generative AI, especially for text, kind of came

came out of the blue for some people or at that level. If you were following deep learning, there was a lot of computer vision. There was a lot of object recognition. The hot dog or no hot dog in Silicon Valley was like a funny moment that got thrown in there. And if you knew it, if you got it, you got it. But that's a huge difference between that and like with something like ideogram today, for example, or black forest labs and flux, or even between early iterations of Siri and

and Alexa. Like what was that jump again between that got us from that early deep learning days where you could classify objects in a database, let's say to today where we're generating like, there's no superlative. They're just like amazing things you can generate. No, it's a good question because look, I think that it's quite common today to hear the view that, oh,

the jump to generative AI required sort of fundamentally new approaches, not just incremental improvements and the connection between early classification and modern generative AI is overstated. And I think that's a fairly myopic view of what happened. I think if you zoom out, it starts to become clear, like actually it was sort of very steady improvements in model size, data and techniques that led to increasingly sophisticated AI capabilities. And that, you know, generative models

pretty clearly evolved from discriminative ones as a natural progression. I would say that the biggest shift that happened in 2019, 2020, post the transformer, the publishing of the transformer paper was actually not a mindset shift at all or a research shift in fundamental learning techniques or architectures. It was really an attitude shift towards research engineering.

It was how much compute does it make sense to throw at a training run to test empirically whether scaling laws would hold or not. And I think there was an increasing risk appetite and I think the dynamism, conviction, the imagination of a few folks in the industry to say, let's throw a thousand X more compute and see what happens. Hey, we might learn that actually the models, the loss curves don't actually converge and we had to throw that all away.

If you actually look from an architecture perspective, a research perspective, the progression from image classification to like today's generative AI models, you know, from the sort of hot dog or not, the chat GPT and GPT 4.0, you can draw a pretty straight line between the dots in the middle. One was obviously,

transfer learning. After AlexNet, researchers found that features learned by models like AlexNet could then be repurposed for other tasks, making it much easier to apply deep learning to new problems like self-driving cars. After that, this clear moment of progress around GANs, or generative adversarial networks, which the first wave of those were proposed in around 2014. And GANs provided a pretty neat framework for generalizing new realistic images.

And then, of course, you have the attention mechanism and the transformer paper in 2017, which got a ton of attention, love or attention, no pun intended, for adapting many of those techniques to natural language processing. Right. We're allowing models to then focus not just on features in computer vision, but to focus on relevant parts of input and capture long range dependencies in NLP and sort of large corpuses of text.

And that brought us to 2020 with the scaling laws showing that if you could increase model size, data and compute,

that you can very predictably improve model performance. And then I think we almost in sort of a poetic back to the future moment, we're in the middle of this like computer vision first multimodal era right now where models that were originally developed for language processing are now turning out to be pretty good at multiple types of data like text images and audio and video simultaneously.

For me, those were sort of the four or five big moments, right? Transfer learning, then GANs, then attention mechanisms and transformers. Then they were the scaling laws. Now we're in the sort of multimodal era. And then I think it's this, it's an homage, I think, to the iteration that scientific fields often require, as opposed to the common narrative that can build that of instant revolution versus 20-year success stories or, you know, iterative evolution that results then in breakthroughs. How much do you think that productizing stuff or

Otherwise, releasing it to the world helps out. Because if I think of GPT-2, it made some waves back in like, I don't even remember, like 2017, 2018 timeframe. But then GPT-3, there was an API. And that was like, what, 2020? And people started playing with it.

I think that was a huge moment in the following sense. Like if you go back and read the mission statements or the vision statements of a lab like OpenAI, it says we are a research and deployment lab. But really for the first seven years, they looked more like a research lab than, and the deployment was the quiet part. And I think that was because you could argue that,

the research wasn't productized sufficiently in a way that was useful. You're right, the GPT-3 API had actually been around for a long time before ChatGPT came out. And I think what ChatGPT did was do a little bit of an interesting RLHF, you know, do the instruction tuning to turn a base API. You know, the GPT-3 API was a sentence completion API

API, and it repackaged the form factor into this assistant form factor. And not much else, actually. It wasn't like they trained a new model or anything. But sometimes the form factor is so important for the world to realize the usefulness of a general purpose technology, like a language sequence prediction model or a next token prediction model. I actually think there were two other things that were almost more important in the deployment of those systems. The first was when they launched ChatGPT, they gave away the inference for free.

And not like we're not talking like a million bucks of inference. I think the last estimate I heard was north of $50 million in free inference away to the world. That was really critical for the deployment of these systems because while there were other people building AI companions or chatbots on top of GPT-3, it was just way too expensive for most people to try it. And the early wave of people who found product market fit or found a lot of usefulness with these systems were like students, high school students using it for their homework.

And students don't have a lot of money sitting around to try new AI tools until it's clear it's useful, right? And so I think one was they deployed it with free inference, and that gave a lot of the world a chance to try out the model. And I think the number two value of doing that was it spurred a lot of other organizations to try increasing the compute that they were throwing at training runs. And that increased the amount of investment going into training models. And I think that was actually pretty important.

Because up until that, it's sort of this chicken and egg problem, right? Is the GPD2 is good enough at some of these story tasks, but will it be good enough at like real use cases that help real people? And you're like, well, you won't really know until we try scaling it up.

Well, what happens if you throw a thousand X compute at it and it doesn't work out? You've just burned all that capital. And so I think the value of JATGPD was in saying, well, let's try scaling up these networks empirically. Because remember, unlike Moore's law, scaling laws are empirical. They're not predictive. So you kind of don't know if they're holding on until you try them. So you mentioned GANs, which were, yeah, a decade ago, I think was when that paper first came out. It was eight years, give or take, until we started seeing DALI and stable diffusion and stuff.

hit the market. It seemed like it was maybe a little shorter between Transformers and ChatGPT, at least, you know, five years, give or

give or take. I'm just curious if that's a normal timeline. And if it is, what should we expect to see? How should we expect progress to happen in the years to come? Yeah, the base of progress question is an interesting one. So I think, again, a commonly held view would be, or one I hear often is that, oh, AI research to production cycles are accelerating due to increased funding and compute power. And we can probably expect continued rapid progress and deployment of new AI technologies. I often see this sort of like

exponential curve that says, "You're standing over here and two years from now, we're going to keep seeing this exponent." The opposing view would be that the easy wins have actually all been made and the future breakthroughs will probably require more time and fundamental insights and the low-hanging fruit has been exhausted and current progress may actually hit diminishing returns leading to a slowdown. I like to believe the following, which is that sometimes an exponent can be approximated by a series of stacked sigmoids.

And I actually think where we are today is a great example of this, where like the step function difference between GPD3 and GPD4 was dramatic. Clearly, that felt like we were on the exponential part of the curve, whereas GPD4O has been somewhat more sigmoidal. It's not like it's been in orders of magnitude more useful than GPD4 was.

And I think that's okay because often what you need is a period of sigmoidal stabilization for people to realize and learn what did work and did not work in the exponential part of the curve. And then you see a bit of a plateau and then you see another stacked sigmoid. And what we're in the middle of right now is what's so beautiful about being in generative modeling today is when you, even though we might be on the sigmoidal part of the language model curve,

We're clearly in an exponential part of the curve, as you said, on like image generation or video models or audio. But I do think we'll probably hit a sigmoidal part of that curve pretty soon. But I think the important point is if you zoom out enough, it's so clear that the pace of progress, particularly in technology,

in the application of deep learning to new scientific discoveries is orders of magnitude faster today than it was a decade ago because of the increased accelerations or the increasing investment in these techniques. And I think

the Nobel prizes actually signal that. How do you think it changes things? The fact that there's just so much money being pumped to this right now, and we've juiced the ecosystem with like so many GPUs, right? And compute is such a fundamental part of it. I mean, does that help like artificially boost things a little bit in the sense that like, if you have an idea, you can probably find money to fund it. I sort of go back to the bitter lesson, right? Which is that over the 70 or so years of computer science history, what we've learned is that general purpose techniques outperform sort

sort of more specific, domain-specific, hand-engineered techniques. So methods that scale with computation, like search and learning, are pretty good bets to take in fields where the rate of progress has largely been bottlenecked

on the application of general purpose methods like search and learning, I think it's a pretty safe bet. I don't think it's an artificial booster, so to speak. I think it's a pretty reasonable bet to say, you know, in protein design or protein folding, one of our fundamental limitations has been computation. And so pretty reasonable idea to apply deep learning there. I think in other domains where the rate of progress is not handicapped,

or on a general purpose technique like search or learning. And a good example is in biology, right? Especially in wet lab biology, the fundamental rate limiter is literally like running experiments and proving in the lab, like, you know, in a physical lab and proving a hypothesis out that your machine learning model said should work. Like ultimately, your model can tell you that here's this new protein that I think will be

or a new molecular structure that would be pretty good at solving a particular condition or a disease. But then you've actually got to go run that experiment in the lab and prove it out empirically. And I think in those places, there might be an over-application of deep learning, where actually the fundamental rate limiter is just doing science in a wet lab.

Maybe you can help explain the difference. My understanding is that AlphaFold, the protein modeling model that DeepMind built, and again, people who worked on that won Nobel Prizes for chemistry this year, as people probably realize. How does that compare and or contrast to a transformer that you might see powering an LLM or something? Because my understanding is it's a transformer-based model. So I'm curious, architecturally-wise,

or even from a training point of view, how that might look different. Yeah, look, this is a pretty hot button topic for the research field to debate. And depending on who you ask, you'll get three different answers if you have three different people. One side of the house will tell you, oh, these are completely different techniques, you know, conflating an autoregressive language model like GPT-4 with a 3D structure prediction model like AlphaFold is like saying, you know, apples and oranges are all fruit. And that's too high level of a generalization to have a useful debate about. On the other hand, if you ask,

ask some researchers, they'll tell you actually diffusion is just a form of autoregression. I think, in fact, a really good blog post recently from Sandra Dielman at DeepMind said diffusion is spectral autoregression, which is basically that if you do a little bit of signal processing on these models, it reveals that diffusion models and autoregressive models really aren't that different. That diffusion models of images are

basically perform approximate auto-regression in the frequency domain, just a different domain. And so I think, long story short, I think this is an evolving question we don't really know the answer to, but they all share enough roots in the neural network architecture

and the bitter lesson of trying to curate usefully diverse data about the world, and then throw a model at it to understand both the explicit and interstitial representations about that domain such that you can get much, much more useful predictions out on the other end. So frankly, I think these techniques are all sort of more similar than they are different.

I think while the last couple of years have tended to discuss AI in modality-specific ways, image model, text model, language model, those barriers are breaking down now. And it's so clear that we're in this multimodal world where often what matters way more is what kinds of data you collect and how you process that data into a latent space

than actually what specific kind of architecture you use to learn those patterns. Do you think that portends a future where more people try to tackle these types of problems as models become more general? And maybe even as other fields start to make data standardization or collection or otherwise put the priority on amassing high quality data, does that open up, do you think, open up the field system to more and more doing that kind of research? I mean, it's obviously important. It's obviously valuable, I think.

anyone wants to have their name on like solving something medical or

or biological problems. I'm somewhat biased because we spend so much time talking to people who have already, so to speak, come to this realization that computation is just a fundamental approach to doing fundamental sciences. If you're trying to do science without leveraging Moore's law, without leveraging the bitter lesson, without leveraging advanced expert level computational systems,

you're probably going to make less progress than somebody who is. We've already seen actually for a while now that economics, the vast majority of useful discoveries in economics don't come from a Milton Friedman-esque armchair theorization of how the markets work, even though that was very useful at that time. It's increasingly quantitative in nature. It's quantitative economics, right? I like to look at historical analogies of how disciplines have sort of evolved. You know, in physics, you could argue that

Originally, the vast majority of physics breakthroughs were about the study of the large, right? Which is astrophysics and Newtonian physics and so on. And then we had a period in the 1900s, which was largely the physics of the small particle dynamics and electrons. And now we are squarely in the time of the physics of the many, statistical mechanics and condensed matter physics, which try to explain how complex emergent systems interact.

And I don't think it's a coincidence that many of these breakthroughs in modern AI, like scaling laws, came from folks with backgrounds in physics. Because neural networks and these computational techniques are extraordinary at helping folks study the physics of the many. And that's where I think a lot of frontier work, a lot of white spaces in fundamental science is understanding how these complex systems interact.

We are seeing and I hope we will continue to see many more researchers pushing their own fields forward by leveraging these modern advances in computation because they help you study emergent systems with way more efficiency than we've had in the past. I think the rate limiter on that frankly is not compute as a lot of people like to talk about. It's that

Access to tools, the right tools, especially software engineering tools, is quite limited in academia, especially in fundamental sciences. And so we do need to see more collaboration between computer science and other fundamental disciplines. That's what drew me to bioinformatics because it was just so clear that using unsupervised learning on large unstructured data sets like physician notes and longitudinal EHR records was yielding much, much more precise diagnoses than

for things like terminal conditions than having a physician, like a human, just try to reason about that without those tools. And I think that there's a growing divide between organizations who get that and have invested in those resources to build large-scale AI systems and those without. And that a lot of academic labs today, especially in the fundamental sciences, don't have those resources. And so that's a real problem. But I think it would be over-rotating to go say, well, okay, then the physics lab should basically turn into, at Stanford, should become a computer science lab.

or a data engineering lab, right? I think that fundamental research in algorithms and architectures remains really crucial for long-term progress in these fields. And I think that the role of universities and non-commercial entities is pretty important. I think it's more about how do you complement those skill sets, those fundamental research skill sets with the engineering expertise we're talking about to make sense of data at sufficient scale. That's the problem. How do you think we help universities and other non-commercial entities actually step up

their contributions. I mean, this could be a little broader just into AI in general, but I'm using all very broadly, but all the compute, all the data, all of the large portion of the talent resides within large, well-heeled, well-funded commercial labs. How do we get these universities back into a state where they're making meaningful contributions?

Yeah, look, so I think this is a great question. I think it's also a super unresolved question. It's a topic of active debate, but there's a few promising threads. The first is programs that allow university labs and students to access the kind of compute and data resources that are only currently locked up inside of industry labs. And I think actually a great success story here is the CIFR program in France, which is kind of a secret weapon I think other countries should adopt like immediately.

which is that there's a government-sponsored program in France that allows academic PhD candidates to actually complete a lot of their coursework at industry labs. Obviously, an example of this that's close to home for us is Guillaume Lampel at Mistral, who was both duly enrolled for his PhD at a university in France,

while completing a ton of his research coursework as a researcher at Meta working on Lama. And what that's allowed many great French up and coming young researchers to do is not have to compromise between doing interesting fundamental research and having access to the resources required to make those contributions.

I think it's a great regret of mine that we don't have that ability in the United States for leading research institutions, you know, if you're a researcher at Stanford and so on, to be able to both complete their work at an industry lab while continuing their PhDs. I think it's a great tragedy. So that's one. I think we can resolve that tension through good and thoughtful policymaking. I think the second is for many of these labs to be able to leverage open source.

I'm very glad that relative to a year ago, there are just a number of really strong base model open source options for researchers in any scientific field to leverage, where they can focus instead on fine tuning and adapting those models to their domains rather than having to pre-train those models from scratch. And I think that's fantastic. And then I think the third, which is a little bit more unclear, is the data engineering problem, which is if you're a really great physicist or an economist and you want to make

you want to leverage AI in your field, even if you have access to Lama 4 and you have Autonic Compute, a lot of the challenges are actually sort of software engineering challenges around data engineering, data processing, data acquisition, data massaging. And I think that one's an unclear problem. It's something I'm talking about actively with a number of folks in the field about how do we bridge that gap and

And I think step one might be taking a page from what has worked in the past where you often had this idea of secondments or sabbaticals where leading talent at places like Bell Labs could do a kind of tour duty with an academic lab.

almost as if they were on loan. And I think that's a very interesting kind of approach, but it's early, it can be messy. It kind of requires often an alignment of a lot of stars for it to make sense for someone who's a really talented, call it data infrastructure engineer, to take themselves out of the center of the action in the middle of an industry lab and go work at an academic lab. But I think it's very promising. I think it's quite impactful. If you're really great at doing large-scale data processing,

at Databricks and you can go help a few biologists unlock the next Nobel Prize, that's a pretty, I think at least appealing opportunity. And so I think it's mostly in that sense about crafting the right program for something like, for folks like that. The other thing I want to discuss quick in closing is like,

It does seem like as access to computers and knowledge becomes more available, we've seen a shift in terms of how some innovations come to be, including like an AI with like individual hackers and small teams and open source projects, like you mentioned, playing a more prominent role. I'm curious, like how big of a role you think those types of teams and those types of like individuals can play going forward? And then too, like how do we keep them excited? Yeah.

and keep that sort of ground level or grassroots innovation research happening, considering all the hype and all the drama and everything that surrounds AI right now? Yeah, yeah. I think this is a great question because it can be easy to conclude that the most impactful AI research still requires resources beyond the reach of most individuals or small teams, right? And that open source contributions, while valuable, are...

unlikely to match the breakthroughs from well-funded labs. I've even heard some dismissive folks call it cute and undermine the value of those. But on the other hand, I think that you could argue that open source and individual contributions are becoming

increasingly more important in AI development. I think that the democratization of AI will lead probably to more diverse and innovative applications. And I think in particular, the reason we should expect an explosion in home scientists, folks who aren't necessarily affiliated with a top tier academic or for that matter, industry lab, is that as

the tools get, as open source models get more and more accessible, the rate limiter really is on the creativity of somebody who's willing to apply the power of that model's computational ability to a novel domain.

And there are just a ton of domains and combinatorial sort of intersections of different disciplines that traditional are a blind spot for traditional academia. Because, you know, if you're trying to become a published academic in biology, it's not particularly rewarding to try to veer off the publish or perish path.

conference circuit. And if you're at a large industry lab and you're not contributing directly to the next model release, like it's not that clear how you get rewarded, right? And so being an independent actually frees you up from the incentives misstructure, I think, of some of the larger labs. And if you get to leverage, you know, the millions of dollars that the LAMA team spent on pre-training, applying it to data sets that nobody else has perused before, it results in pretty big breakthroughs. Just circling right back to the Nobel Prize and I guess, you know, the Turing Olympics,

award would be the AI or computer science equivalent. Do you see a time where, again, considering just how much work right now is coming out of large labs and large companies, did you see a time in the future where, you know, maybe we see a Turing Award winner or a Nobel Prize winner or whatever from a place like Noose Research or from some open source project that takes off? I mean, if I think about open source projects over the last couple of decades, there are a handful that like seem like they should be rewarded and recognized in some capacity. So

Yeah, well, you're right. So the commonly held view would be that the major awards will, of course, continue going to researchers from established institutions or large tech companies. By the way, the fact that I'm saying large tech companies as the status quo, the common view is kind of crazy, but here we are. And that the infrastructure and resources needed for breakthrough work are beyond most open source projects. And look, I think the contrarian view would be that actually as AI becomes more democratized, especially if we continue

the base of open source continues, we will see groundbreaking work emerge from unconventional sources that, you know, these open source collabs could produce Nobel-worthy advances similar to Linux and operating systems. Like Linux is a Nobel Prize-worthy achievement in my mind. Right. Right. And so I like to lean towards the opposing view. And while it might be unlikely in the near term, I think the long tail of innovation and open source could very well produce these award-winning breakthroughs. And I think that reflects the wisdom of Crowd's principle, right? Applied to

scientific research. That if you have these tools applied to enough combinatorially novel new areas, that it's only a matter of time before we see advances that are Nobel worthy or Turing worthy. I think in every field at the intersection of two disciplines, if you had an independent researcher who could basically have the creativity to take an open source model and apply it to a novel data set, we're going to see a bunch of breakthroughs.

That's it for this episode. If you enjoyed it, if you learned something, or if it struck a chord some other way, please do rate the podcast to wherever you listen. Until next time, thanks for listening.

Neural Nets and Nobel Prizes: AI's 40-Year Journey from the Lab to Ubiquity 40:12 Share