We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Bill Dally: NVIDIA’s Evolution and Revolution of AI and Computing (Encore)

2025/1/16

Generative Now | AI Builders on Creating the Future

AI Deep Dive AI Chapters Transcript

People

Bill Dally

Topics

Bill Dally: 我从加州理工学院攻读研究生时期就开始接触神经网络，但当时计算能力的限制阻碍了其发展。后来，我参与了并行计算和流处理技术的开发，并与英伟达合作，将这些技术应用于GPU计算，使并行处理更易于访问。在斯坦福大学任教期间，我目睹了自动驾驶技术的发展，并认识到数据和统计方法的重要性。加入英伟达后，我意识到GPU是运行神经网络的理想平台，并推动了CUDA和cuDNN等技术的开发。如今，生成式AI的快速发展超出了我的预期，但它将彻底改变人类生活，这毋庸置疑。数据并非AI发展的立即障碍，因为还有大量私有数据和合成数据可以利用。作为英伟达首席科学家兼研发高级副总裁，我负责推动技术改进和领导研发团队，致力于开发新技术并加深对其运作机制的理解。我们正在研究将生成式AI应用于自动驾驶汽车领域，包括创建训练环境和改进感知、规划和预测能力。我们也在研究新的技术来保持在GPU领域的领先地位，包括改进AI的数值表示、处理模型的稀疏性以及提高平台效率。自动驾驶技术的进步主要在于处理罕见情况和确保安全，而生成式模型在其中发挥了重要作用。英伟达利用生成式AI来提高芯片设计的效率，包括使用大型语言模型辅助设计、改进调试流程以及优化电路设计。我们通过设定高标准、创造良好的工作环境和提供高影响力的工作来吸引和留住人才。保持领先的关键在于成为变革的推动者，并拥有核心技术优势，敏捷地应对应用变化。我建议新毕业生选择能够学习新知识、与优秀人才共事并拥有良好文化的工作。与年轻一代的AI从业者交流，能让我从不同的角度看待新技术。未来的学生可能不再需要学习传统的编程方式，而是更注重API的整合和应用。 Michael Mignano: 作为访谈主持人，Michael Mignano主要负责引导话题，提出问题，并对Bill Dally的观点进行总结和回应。他引导Bill Dally回顾其职业生涯，探讨AI技术发展，以及英伟达在AI领域的研究和贡献。

Deep Dive

Chapters

Bill Dally's career is a testament to the evolution of AI and computing. Starting with early neural network experiments in the 1980s, he transitioned through parallel computing and stream processing, culminating in his pivotal role at NVIDIA. This chapter details his journey and the factors that led to the current AI revolution.

Early neural network experiments at Caltech in the 1980s
Development of stream processing technology at Stanford
Partnership with NVIDIA in developing CUDA
Role of chief scientist and senior vice president of research at NVIDIA

Shownotes Transcript

Translations:

中文

Welcome to Generative Now. I am Michael Magnano. I am a partner at Lightspeed. NVIDIA is undeniably a cornerstone of the AI revolution. Their groundbreaking GPUs are the workhorses of modern AI research and development. NVIDIA also made some major announcements at CES this year. And that's why I'm revisiting a conversation I had with Bill Daly. Bill is the Chief Scientist and Senior Vice President for Research at NVIDIA.

He is one of the most forward-thinking minds when it comes to computer hardware and architecture. His decades-long career started in academia at Caltech, then MIT, before later becoming the chair of the computer science department at Stanford before transitioning to NVIDIA.

We talk about his early experiences in the 1980s, playing around with neural networks at Caltech, the pace of the AI evolution, and why he believes that AI is the technology that will revolutionize all human endeavors. So check out this conversation I had with Chief Scientist and Senior Vice President for Research at NVIDIA, Bill Daley.

Hey, Bill. How's it going? It's going well, Michael. Thank you so much for doing this. Really, really appreciate the time. Very excited to talk to you. I've been looking forward to this one for a while. You have obviously incredibly impressive background and role at NVIDIA, and I

There's so much we could get into about NVIDIA and the state of AI and GPUs and all of the research that you and your team of, I believe, hundreds of researchers are working on. But maybe before we get there, like I said, you have such an impressive career. I think the audience would love to hear a little bit about what you've done over the past several decades across academia, entrepreneurship, business.

this role as chief scientist of NVIDIA. Give us a little bit of the background of your story. - Okay, what's relevant to AI and the like probably started when I was a graduate student at Caltech. This was 40 years ago in the 1980s. I took a course on neural networks, and I thought that was just a really cool technology. We built little multi-layer perceptrons and confnets and these things called Hopfield nets that were little associative memories.

But it also impressed me that it was a toy, that it was a great technology, but the compute wasn't there at the time. But that was a formative thing. And then later on, I was a professor at MIT and I was building parallel computers. And it kind of struck me that parallelism was the technology, was a way to scale performance in a way that you couldn't do with serial processors anymore.

But at the same time, existing software was a huge inertia that, you know, with Moore's law in effect then, and the Moore's law about serial processors, not about transistors,

People could just wait, and, you know, every 18 months or so, their performance of their computers would double. And so why rewrite all your software? You know, if you went with parallel computing, your performance would go up by a factor of four. If you just wait, it goes up by a factor of two, and that's just too easy a path to compete with. So it wasn't really until that ended that parallel computing took off. Until Moore's Law ended.

Yeah. And so then, you know, in the early 2000s, when I was on the faculty at Stanford, we developed this technology called stream processing, which is a way of really making parallel processing more accessible by managing the data movement in a very effective way. And we partnered with it with NVIDIA, the development of the NV50, which came to market as the G80, to make that technology broadly available in the form of CUDA.

Now, another thing that was going on about the same time when I was on the faculty at Stanford, I was chair of the computer science department when Sebastian Thrun won the grand challenge for a autonomous car to drive itself across, you know, the desert from, what is it, Barstow to Las Vegas or something like that. And the technology that made that work, I remember, you know, going to one of Sebastian's meetings and they were talking about how they're having trouble having their car tell the difference between the road and the desert.

And it's actually harder than it seems because those dirt roads, they're dirt and the desert is dirt. And how do you tell one dirt from the other dirt? And, you know, they had the smartest graduate students trying to code up manual feature detectors to do that. And it wasn't working. And so they just acquired a lot of data and they used statistical methods to do it.

It wasn't neural networks at the time. Again, the compute wasn't quite there for that. I'm trying to remember what it was, but it was a way of automatically discovering features by mining lots of data. And it just struck me that that was a very powerful technology. And a few years after that, after I'd left Stanford and joined NVIDIA about 2010, I had a breakfast with Andrew Ng. And at the time, he was working at Google Brain Network.

you know, finding cats on the internet using 16,000 CPUs. And it struck me, okay, you know, that's a lot of compute power, you know, I should say it's a lot of expense for that compute power, but we've gotten there. These neural networks that I played with, you know, back in the 1980s, we finally have the technology to make them real. And it also struck me that CPUs aren't the way to do this, right? What we should do is get the stuff running on GPUs. So I got somebody at NVIDIA Research to go,

to port his catfinding code to GPUs. And that code ultimately became kudnn. That's kind of the path I took, starting in the academic world at Caltech and MIT and Stanford, sort of seeing all the pieces come together, the original neural network technology, the parallel computing, evolving that into stream processing, into GPU computing,

And ultimately converging on where we are today, building the engines that are basically powering this revolution in AI. And these engines, obviously, there's enormous demand, you know, unlike anything I could have ever imagined even just, you know, 18 months ago. One of the questions I like to often ask people sitting in your seat on this podcast is,

Did you did you expect what has happened over the past 12 to 18 months? Obviously, you've been thinking and working on this stuff for the past couple of decades. But but did you even know what we were about to experience, you know, through the explosion of generative AI? And I didn't expect it to happen this quickly. OK, so I was convinced that this was the technology that's going to revolutionize technology.

all of human endeavor, right? You know, how we play, how we work, how we, you know, educate, how we get medical care, everything about, you know, life would be profoundly affected by AI. And I knew that was going to happen, but I thought the change was going to be more gradual and not quite as frenetic as it's turned out to be. And it's interesting because it was a slow start. You know, things were moving along. You were seeing lots of

applications of AI, starting with the ConvNets back, you know, you know, 10 years ago, maybe 12 years ago, people were starting companies to, you know, in agriculture, you know, tell what is plant, what is weed and squirt the herbicide on, on the bad one. And, you know, it was, it was happening, there was growth, but then, you know, when chat GPT came out, it was like somebody, you know, turned the, the, the rate knob way up and, and things just

became a lot faster. I was not expecting that. Could data be a limitation at some point in the near future? You know, it certainly is one of the key ingredients that you need to make this work. But there's so much that can be done, you know, to both mine private, you know, repositories of data that many companies have that has not been addressed yet, and also to create synthetic data, which we found very effective, you

in many applications. Now, I don't see that as an immediate concern. I think that there's going to be

plenty of data um both on the private side and on the synthetic side that you know the the usual thing people have done is kind of scraping the web and getting getting data um yeah they they may be nearing the limits of what can be done with that but there's a whole lot more data out there yeah makes sense so talk us through what does it mean to be the chief scientist at nvidia you know give us a day in the life of of bill daly and your team the world's most fun job i think um you know um

So I get to do a lot of interesting things. You know, my role is as chief scientist and senior vice president of research, and it's actually two distinct jobs. So as chief scientist, my job is really to kind of poke my nose into everything going on in the company and try to make the technology better.

um and so whether it's you know um i'll attend the meetings on planning for the next generation of gpus you know for autonomous vehicle projects for robotics projects and try to stay up to date and connect people to oh there's somebody at this university doing something really exciting that could make this better let's take a look at that or you know maybe we should be pushing harder on a new packaging technology for the next generation gpus i'm just trying to push people out of their comfort zones

little bit, get them to try things that could make stuff better. And then the flip side is I run the research organization, which is like a giant playground. We get smart people from all over doing exciting things, ranging from circuit design on what I call the supply side of the research lab, supplying technology to make the GPUs better. And then we have people doing

you know, all sorts of AI, autonomous vehicles, graphics, robotics on the demand side. And it's just, you know, it's fun to meet with these people. They're smart people. Talk to them about, you know, what ideas they're doing. And I try to, my job is to get obstacles out of their way. I try to enable them by finding out what's blocking them and remove the blockages so they can do amazing things.

Yeah, so it almost sounds like the chief scientist part of your job is really about thinking about the future and planning for the future. And then the research org is about, hey, what is the research we can be doing now to make the technology and the product offering better for our customers? Is that a good way to summarize? Yeah, it's a really great way to summarize. So the two fit together. And very often, you know, for the chief scientist job, thinking about the future, we try to identify gaps. We try to have this vision of

you know, where we want things to go with, you know, both the GPU hardware, the software, the applications. And we say, why can't we do that today? And on the research side, we try to fill those gaps. We try to, you know, what technology can we develop that will make that possible? Got it. So maybe starting with the latter on the research side, the research org, like what are some of the things that are most exciting to you and the team right now, either areas of focus or

or specific papers or bits of research that you're working on right now? Yeah, well, generative AI has to be kind of the most exciting thing going on. And so we're trying to develop new technologies for that and trying to develop some fundamental understanding of it as well. Our research group in Finland wrote a paper a little while back

that basically really kind of laid out how diffusion models really work and actually in the process made how they are applied much more efficient. And

And so that's the kind of thing we do is try to, you know, look at the technology that everybody has jumped on, sometimes without really understanding how it works, and try to dig down and understand, you know, what makes it tick and how we can make it better. We're doing really exciting things, you know, across the generative space, both of language models, you know, with vision and video models, and probably most exciting is multimodal models that

Yeah. Combine all of that stuff together. And it's it's fun, fun to watch it happen. And and there's a lot of energy there. People are pretty excited about it.

Yeah. I mean, on the topic of multimodal models, you know, we're recording this just a few days after the recent announcement from open AI, uh, with GBD 4.0. I mean, I think that's an example of what you're talking about. Super impressive, super impressive to see these things come together. Right. Um, you know, again, I'm, I'm not an expert on, on the technology, but my understanding is, you know, uh, this is the same approach that we've seen, uh,

to these large language models and other types of models over the past couple of years. But now when you put these things together into one, just enables a whole other type of interaction and experience. That's pretty mind-blowing. Yeah. And it also opens up the data space

to orders of magnitude more. You asked this question about data. Yeah, where do you get the data? When you're just dealing with language, there's so much data, but then all of a sudden when you say, okay, let's add in, you know, videos and images and audio, now all of a sudden there's an enormous amount more data. And you think about how people learn and experience the universe, a little bit is by reading books, but a lot of our experience is visual and is really, you know, experiencing it through seeing things.

And now our models can do that as well.

Yeah. How does how does the research to understand what's happening with these models, whether it's a diffusion based model or a transformer, like the sheer size of these things? How do you go in and understand what's actually happening under the hood? Yeah, I mean, it's it's a case by case thing. And you have to do pilot studies. In fact, very often when we when we build our big models internally, it actually stops being research. It becomes a production task. Right. Because

You know, a lot of resources are being applied. A lot of people are being applied. We're curating lots of data. But before we set up for that, we'll try to do little pilot studies. We'll say, you know, you do ablation studies. Let's take this away and see what happens to take this way and see what happens. And then we also try to just develop some math behind what's going on.

So we can sort of predict that if we do something, what will happen. And then from that, we start developing an understanding of what's going on, what's really represented by this embedding, by this latent space and being able to anticipate and predict what would happen if we make a change to the model or to the process or something like that. Right.

So like you said, it's a challenge of production. So even just this research to understand what's happening, just like building any of these models must require enormous scale in terms of compute, in terms of data. I mean, it's like you're building these models on your own from scratch. Yeah, well, we are actually in some cases, but we try, you know, when you put that many resources into something, you have to be pretty sure it's going to work or you have a difficult conversation with Jensen at the end of the day. Right.

And so you try to do the little pilot experiments in advance of that so that when you run off a big training run, you have a very high probability of success. Right. What else on the research side is getting you excited right now? I've heard you talk a little bit about autonomous vehicles. Is that an area where your team is spending a lot of time on research? Yeah, we have one group doing autonomous vehicle research, and they're very closely collaborating with our autonomous vehicle product team.

There's some exciting things going on there, actually applying foundation models to autonomous vehicles, but both as a way of creating a training environment, you know, creating, you know, being able to write a prompt and as a result, get a scenario that you can then simulate and run your car through, but also having a model that you can use for the perception and planning and prediction of what the other, you know, actors in the scene are going to be doing.

And so there's a lot of very exciting things coming together at that nexus of generative AI and autonomous vehicles.

Now, there's also a bunch of exciting things going on on the supply side. I mean, we're constantly pushed to say, you know, how can we stay ahead? We, you know, in my opinion, I think we have the best platform for AI today. And, you know, but people, as soon as we announce a GPU, other people can sort of copy what we've done, right? And, you know, in four years' time, they'll have a platform that's probably as good as ours today. So how can we continue to stay ahead? And there's some pretty...

Pretty exciting things we're doing on that front as well in terms of new number representations for AI, new ways of handling sparsity of both the weights and activations in these models, and just ways of making the platform more efficient. So for a given amount of silicon area, given amount of power, how can we get more out of it?

Yeah, I've seen, you know, I've seen other companies recently announce, you know, really large chips or, you know, lots of different takes on the architecture. And, you know, it often makes me wonder about, well, I wonder what NVIDIA is going to do next. It sounds like you're thinking multiple steps ahead and maybe already have the roadmap planned out on the supply side for multiple iterations of chips and designs.

Yeah, we have to we have to stay ahead. I mean, the things we're going to do for the next couple of generations are already pretty much in the bag. And I can't I can't talk very much. Of course. Yeah. But in research, we're trying to look them beyond that and say, what what are the things, you know, you know, three, four or five generations out. Right. Yeah.

And and it's fun. It's, you know, with in some sense, it makes computer design even more fun than when Moore's Law was in place, because back then you made small tweaks and run stuff in the new process and you get a faster CPU. Now, you know, we're getting maybe 10 percent out of a new generation of technology. So, you know, most of what makes makes it better is better computer architecture, better software.

you know, better design, you know, things from the creative process, not from the semiconductor process. So maybe applying what we talked about earlier with, you know, the explosion of generative AI over the past couple of years back to autonomous vehicles, what has impressed you about, you know, maybe some of the advancements in autonomous vehicles over the past year or two?

It seems like on the consumer side, there's been some pretty big breakthroughs. Waymo is now doing, I think, tens of thousands of trips. Obviously, Tesla full self-driving seems to be getting more and more reliable. Are we getting closer to this? And how does the work that NVIDIA does contribute? Yeah, I mean...

So it's hard to say. I mean, it's one of these things that's a very difficult problem. In fact, you know, I'm on the record of a decade ago saying that we were almost there. And, you know, a decade later, we're not. And the reason is, it's one of these things where you've got to get that long tail. And it's really a problem of chasing the rare cases and making sure

They're handled well. You know, I think I think, you know, the leaders in this field, the way most of the world are doing a great job of that involves an enormous amount of data. It involves, you know, having a great discipline, a real safety culture to make sure that you really made sure that under situations that you have not anticipated, that the vehicle is going to respond correctly and everybody's going to remain safe.

So it's, you know, we've done some really exciting things that I find interesting from a technology point of view with these generative models. And, you know, we've played around with this, you know, various approaches to the architecture of the vehicle, you know, where we have the classic stages of perception and planning and control and the like. And then we also have, you know, versions that are end to end.

And sometimes we try to combine those where we have the usual stages so we can reach in and both observe and control, but we also have trained them together. So we've actually trained our perception with a loss function that is conditioned by what the planning is. So it's perception tuned to what it's going to be used for. And so there are exciting things in the technology there, but ultimately it's a tough game of chasing down the

the rare cases and making sure you handle them well. And it would be a much easier task if it was only autonomous vehicles on the road. You have to deal with these pesky humans that are out there that do difficult to predict things. Right. So you're saying the autonomous vehicles, they'll do predictable things. What humans...

will do is completely unpredictable. And if we see something or if a machine, you can predict them, but every human is different and they may act in a different way. So you're predicting what the most likely human would be. And you even try to develop technology which characterize by observing the actors, both the cars and the pedestrians and the like, it tries to characterize them. That one is distracted. This one is aggressive. That one is about to fall asleep and then predict what they will do based on that characterization. But even then it can be hard. That

That's interesting. So the models will characterize drivers down to the individual, you know, car size.

And say this this driver is this type of driver. This driver is that type of driver. Yeah. You need to characterize what a particular car will do because they're not all going to do the same thing. And you can. Oh, wow. Just like you went out on the road and observe and see what you think they're going to. That's going to. Right. That's fascinating. That's fascinating. You mentioned something earlier about autonomous vehicles, how you're even using large language models to invent scenarios and test cases. I think I remember you saying that.

I think I read somewhere that NVIDIA has basically been applying generative AI to many stages of chip design, even. Talk to me a little bit about that. Like, how is NVIDIA actually leveraging this technology to help make chip design more efficient? Yeah, that's a great question. So we have a bunch of projects to apply AI to chip design to make, you know, to basically eat our own dog food in some sense. Probably the most exciting one is one where we've taken large language models and

and then specialize them with what's called domain-specific pre-training. So we basically take a lot of data that we have, our entire repository of previous GPU designs, tests,

design documentation, and train up the model on this. And what we found is that then we can get a model which is much better than just a general model, even a huge model like GPT-4. We can take a Lama 7DB or something, train it up on our own data, and it's better than a larger model

at a number of tasks and most important ones are tasks that assist a designer to make them more productive. One thing we found is that junior designers tend to use a lot of senior designers time asking questions. And it's part of the process of becoming part of the team and learning how GPUs work and all that. But now we can have them ask the model a question and it gives them pretty good answers, which not only makes them more productive, but makes the people whose time they were using to answer the question more productive.

These models have also been very good at summarizing bugs. So you get a bug report that may be many pages long and it's a bunch of logs out of, you know, some, you know,

test case with where the test went awry. It can summarize that bug now and in many cases also you'll arb somebody to say that there's an action required by a particular designer to now fix the bugs. That makes the process go better as well. In some cases, we have the models writing code, but we more often will have them writing test code or code that configures a particular design tool to do something than writing the code for the GPU itself.

And then there are also applications where we take this technology and we use it as part of the design process. One that I particularly like is we've developed a graph neural net that can take a circuit design and predict what the parasitics are going to be. And this is a huge productivity gain because, you know, it used to be the circuit designer would draw the circuit, hand it off to a layout designer, and a couple days later, the layout designer would finish the layout design.

And you'd extract the parasitics and the circuit designer would find it doesn't work because the parasitics are worse than I thought they would be. And they would have to try something different. And so the design cycle was, you know, a couple of days around that loop. But with this tool, which, you know, it doesn't get it exactly right, but it's very good at predicting what the parasitics will be. It's now seconds around that loop. So the designer draws a schematic, predicts the parasitic, runs a simulation. Now they can iterate quickly while they still have everything in their head about what they were working on.

Another really cool one is we apply reinforcement learning to designing the adders in our GPUs. This is a critical circuit. And it's also something that people have been thinking about really hard since the 1950s. And so there are textbooks written about how to design good adders. And it boils down to this problem of structuring a tree that does what's called a parallel prefix calculation.

It's doing a running sum across the carries of the bits to decide whether you have a carry into a particular bit of the adder. And in this problem that people have been beating on since the 1950s, it turns out we applied reinforcement learning. We treated it like an Atari game of where you put the next carry look ahead node in the tree.

And it wound up beating the best known techniques by a substantial amount as well. And so this is something that's applied now to the design of the arithmetic circuits in our GPUs. You know, another neat one is the productivity increases. Every time we move to a new technology, we go from five nanometers to three nanometers to two nanometers.

We have to redo the entire standard cell library or even within a particular node. If we are targeting, you know, a different foundry, we have to redo the standard cell library for that foundry. That used to take a team of about 10 people, about nine months. So think of 90 person months. Now we have a reinforcement learning program

that basically designs the standard cells and it achieves a higher quality so that the average cell is smaller than the ones designed by the humans and better in a few other metrics as well. But it does it in an overnight run on one GPU. And so that's a great example of applying this AI technology to making the GPUs better. It's really cool. I mean, you're obviously hearing...

stories like this from all different types of companies about how they're leveraging AI. And so it makes sense that NVIDIA, the company that is many ways inventing and creating AI, is leveraging AI to make AI more efficient. It's really cool to hear some of those examples. Maybe let's get into the team a little bit. You mentioned, I think you have like several hundred researchers with PhDs at NVIDIA. Is that right? I read that somewhere. It's about 400. Yeah.

How do you recruit and build a team like that? I mean, these are some of the smartest and brightest people in the world. How does assembling a team like that happen? You know, it took a long time. So I came to NVIDIA in 2009 and inherited a team of, you know, I think it was like about 15 people, most of whom were doing ray tracing, you know, computer graphics. And, you know, from there, you know, created groups, you know, doing...

you know, architecture and circuits and doing AI. And when we first started in any given area, it was very hard because no one wants to come to a place where they're the only one doing something. But by getting some really good people to anchor each place and then hiring really good people, it then becomes easier to recruit talent because people like to join, you know, a team where there are other fun people to talk to and everybody is as smart as you are.

Um, and so we found, we found that we had to really set the bar high and hold it there. You know, as soon as, as soon as we, you know, if we were to let that bar drop and start hiring, you know, mediocre people, that would be get hiring more mediocre people. So we've had to, had to keep it, um, keep it high. And we try to create an environment where people like to be. So we have very little turnover. People come and they stay because, you know, they, they, they get to do what they want to do. We have the, they have the resources to do fun, um, fun experiments. They get to work with fun people. Um,

And they get to have an enormous amount of impact. One great thing about NVIDIA is because we supply the whole industry. If you develop, you know, whether it's a piece of new hardware for AI or a new type of model, a new training technique, it winds up benefiting everybody, benefiting the whole world. Whereas in some of the people we're competing with for talent, if they develop something, then their company will use that, but it won't be spread as widely as the things that we develop.

It strikes me that this team, you know, you've been, we talked about your amazing career in this team of several hundred researchers. You've been through several shifts, like big platform shifts, this one maybe being the biggest and maybe being the most relevant to your work. How does this team, your team that we just talked about,

kind of stay prepared and stay ahead of these shifts? And maybe what advice would you have for entrepreneurs or startups that are also building in this space?

Yeah, so that's a really good question. So the best way to stay ahead of the revolution is to be the revolutionaries to create this shift. But, you know, we can't create all of them ourselves. And although, you know, we actually have developed many of the fundamental technologies along the way, some of them have come from outside. And so the other thing we tend to do is we tend to have a set of core, you know, core technologies, core expertise that

and then be very agile in applying that to different things. And so I would say, you know, at NVIDIA, you know, our core expertise is parallel processing and acceleration. We build processors that have, you know,

hundreds of thousands of elements working in parallel. And then we specialize them over the years. We specialize them for, um, raster graphics, you know, polygon based rendering. Um, that was the core technology from the early days of NVIDIA. Um, we, uh, specialize them for ray tracing. We had our RT cores. Um, we've specialized them for bioinformatics with the, uh, dynamic, uh, dynamic programming instructions. Uh,

that came out in the Hopper generation. And then starting with really in the Pascal generation, but when we introduced the Tensor cores in Volta, we added specialization for AI. And so those two technologies, parallel processing and domain-specific acceleration are very powerful. And so what we have to do then is to anticipate

What is the next big application shift that is going to demand a different type of specialization? The parallel processing is quite universal. You can apply that to everything. Key applications need different domain-specific acceleration. And even AI, as it shifted over time, what that domain-specific acceleration is has shifted. And so we need to be agile in taking those two core technologies and anticipating the applications and getting ahead of them.

Um, and I think that's what anybody should be trying to do. They should have a core expertise and get ahead of the applications. Right. Right. Right. You were able, you were in a very interesting place that you were able to see a lot of this stuff coming. As you said, maybe you didn't expect it to happen as, as big and as quickly as it has, but, um, you know, what are, what are the things that you think maybe entrepreneurs are going to need to prepare themselves for over the coming years?

Yeah, you know, it's it's you're talking about new technologies coming along. New technologies that they're going to. So we know what everyone's building for today. You know, what might they be building for two years from now, three years from now? Well, to me, the real if I look, you know, if I look back and try to use that as a way of predicting forward, you know, you know, a decade ago, we were worried about confidence and recurrent neural networks.

And then transformers came around and all of a sudden nobody cares about recurrent neural networks anymore. And then you were doing GANs for image synthesis and diffusion networks came along. And so what we have to be able to do is to anticipate new models coming along. People are always developing new models. It's just most of them are no better than the old models and so don't get adapted.

And so we're constantly, you know, on our toes trying to figure out what's next. And it's hard to say. We'll look at half a dozen things and none of them will pan out. But we're prepared and we're agile enough that we could, if any of those had taken off, we could have tracked them. Another thing we do is I personally spend a lot of time visiting universities and talking to people who are working, you know, on the, you know, ideas for the next thing to try to at least get a feel for what, you know, what the candidates are, what's out there and what...

what might come to play. - You mentioned there are new models being, and architecture models being developed all the time. You don't really know necessarily which one is going to work. How, maybe taking the transformer model architecture as an example, when was that obvious that that was gonna matter?

Yeah, pretty, pretty soon. Okay. You know, what was it? The paper Attention is All You Need came out in something like 2017. And even at that point, it was pretty clear that transformers were winning. And you have to take the title of that paper into context. The reason for that title was what was considered to be the right model at that point was a hybrid of a transformer with a

with a recurrent network, right? Because the idea was each was giving you something. And then the point of that paper was you didn't need the recurrent network. If you had the transformers, that was all you needed. And the evidence then was that, yeah, that was working better than the recurrent networks. Now, at that point in time, it was being applied to models like BERT, which was, I forget, a couple hundred million parameters. Maybe that's even BERT large. And I think what people hadn't anticipated is how that would scale up.

And as that scaled, it just got better. And it won even more because the real problem with the recurrent networks is that it was a difficult training process to propagate things back through those recurrent cells. And so as you got more data and built larger models, it took way longer to train them and the scaling didn't work out as well.

But for that one, I think it was pretty clear, you know, even in the early days that it was a win. And one thing that's impressed me about the whole, you know, evolution of AI over the past, you know, 15 years or so has been how rapidly people have adopted shifts. You know, I spent a lot of time in the supercomputing world where people would have these, you know, codes that they would have. And if you were a supplier, you know, I worked with Cray for a while, you had to run everybody's codes and

And they would have codes that were 20 years old and they didn't want to change their codes. And so you had this huge inertia in the field of legacy codes that were slow to change. And, you know, a lot of the enterprise computing world works that way as well. I mean, banks are still running code written in COBOL.

But in the AI world, people throw stuff away overnight and tomorrow they have a new model. They don't care about the stuff. It's fun. It moves really quickly. Are there any new or upcoming model architectures that you're particularly intrigued by or excited by? I'm very excited by these state space models. And it's not clear that they're going to win yet, but there's some ideas in there that ultimately...

are probably going to pan out. And in some sense, it's going back to recurrent networks. Right, interesting. Yeah, what about the state space models do you find so interesting? People have at least done a couple studies that show for these studies that smaller models with less training get better results. And if that actually pans out in general, they'll wind up replacing transformers. I don't think they've gotten to that state yet. Got it.

I believe you're an adjunct professor at Stanford and you were obviously a former chairman there at the Computer Science Department, I believe. You give talks at universities and colleges all across the country.

I guess two questions. What is maybe a piece of advice that you find yourself giving to people that are about to enter the industry? And also, what are you learning from academia for students who are growing up in this sort of age of AI? Okay, those are both really good questions. So, you know, that first one, I was actually asked that after a talk I gave at Georgia Tech, you know, it's probably about a month ago. And, you know, my comments were, first of all, to realize that,

New graduates, really what they have is a license to learn. They've been sort of prepared with a lot of theory and a lot of basics, but are not yet really useful or dangerous in a way. And so it's important for their first job is to pick the job where they're going to learn a lot and to learn the right set of things. And so I think the two characteristics there is to pick a place where

you know, that has a lot of really smart people to work with because you'll learn from them. And that is working on, you know, really leading edge problems because you want to learn stuff about leading edge problems, not about the stuff that is no longer, you know, at the cutting edge. And it's probably also important to have a culture that's good because, you know, if it's not, then you can just wind up getting caught up in a lot of, you know, kind of nasty politics and have an unpleasant experience.

So I like to think that NVIDIA is really the ideal place for all these people to come because we check all three of those boxes. We've got lots of really smart people working on leading edge problems. We've got a great culture.

So what was the other question? What am I learning? What are you learning from students who are growing up with AI, right? Who are living through this stuff? Yeah. So, you know, it's interesting. So this is both students and also the new college grads we hired at MBITI. I really enjoy talking with them because they have a different perspective. They've kind of, you know,

they kind of almost take some of this stuff for granted. But then on the other hand, they, from that point of view, when a new technology comes out, they see it differently and hearing their perspective makes me think about it differently. And so I like anytime there's a new technology, I actually like to talk to some of the, you know, the, the new hires about it and see what, what they think. And, and I also enjoy just going to the universities and talking to, you know, to current students, the graduate students who are working on this stuff and, and just getting their perspective because, you know,

That perspective about what is coming next can sometimes be clearer than what I'm seeing because they're seeing it without a bunch of baggage from what we're doing now.

What are, you know, what are students of a decade from now not going to be learning that they're, you know, because of AI that students are actively learning and being a part of right now? What's going to go away? Yeah, that's an interesting question. You know, when I was chair of the computer science department, one of the things I did is I streamlined our curriculum. So we had very few required courses and a whole bunch of people got really unhappy with me because when their course was no longer required, nobody was taking it.

Um, and, uh, maybe there should be no required courses and we should just let people take, take what they want. But I would hope that, that some things are just the core to computer science, um, you know, algorithms, automata theory, you know, the, the basic theory of computer science, how to program, how to think computationally. But a lot of what we learned about, um, sort of, you know, structuring things, um,

by, you know, writing classical code is no longer how people build applications, right? Now they build applications by, you know, getting an API to an LLM and, you know, piping their data into that. And so I think, you know, the generation of students that's coming out, you know, even now, but certainly it'll be much more in five or 10 years, we'll be thinking about how to plug together, you know,

through a bunch of APIs and that's how you're going to build things. And so it's hard, you know, something's going to have to go away to make more time to learn that. And it's hard to see what that will be. But, you know, I think it'll be the more classical ways of programming. So,

So, yeah, so that sounds like you're saying coding almost and programming almost goes away in a sense. Well, somebody's got to write some code. So we have, you know, so we have PyTorch and things like that. Right. Super interesting. Bill, this has been fascinating. I've learned so much. I'm really sure that the audience will as well. So I want to thank you so much for a very, very generous time. We know you're a busy guy. So thanks so much. This is fun, Michael, and I look forward to hearing the podcast.

Thank you so much for listening to Generative Now. If you liked this episode, please do us a favor and rate and review the podcast on Spotify and Apple Podcasts. That really does help. And if you want to learn more, follow Lightspeed at LightspeedVP on YouTube, Twitter, X, LinkedIn, everywhere else. Generative Now is produced by Lightspeed in partnership with Pod People. I am Michael McDonough, and we will be back next week with another conversation. See you then.

Bill Dally: NVIDIA’s Evolution and Revolution of AI and Computing (Encore) 41:06 Share

Generative Now | AI Builders on Creating the Future

Deep Dive

Shownotes Transcript

Bill Dally: NVIDIA’s Evolution and Revolution of AI and Computing (Encore)