NVIDIA has evolved from producing single chips to building entire data centers to ensure software and hardware integration works at scale. They build vertically integrated systems, optimize them full stack, and then disaggregate components for sale. This approach allows NVIDIA to graft its infrastructure into major cloud platforms like GCP, AWS, and Azure, ensuring CUDA, their computing platform, is consistent across environments.
Andrej Karpathy believes future AI models could be much smaller than current ones, potentially as small as 1 billion parameters. He argues that current models waste capacity on irrelevant data, like SHA hashes, and that distillation techniques can effectively reduce model size while maintaining performance. The cognitive core of AI, which focuses on thinking and using tools, can be extremely compact.
Bret Taylor predicts that businesses will transition from websites to branded AI agents that handle customer interactions, including product inquiries, commerce, and customer service. These agents will become the primary digital presence for companies, similar to how websites were in the 1990s. Sierra, his company, is already building such agents for clients like Sonos and SiriusXM.
The OpenAI Sora team highlighted that their video model, Sora, learns about the world, including 3D structures and physical interactions, purely from visual data. This grounding in visual information is crucial for developing more intelligent AI models that better understand the world. They believe Sora’s ability to model the world will contribute significantly to the path toward AGI.
Dmitri Dolgov of Waymo explains that the difficulty lies in achieving 100% accuracy, which requires solving the long tail of rare edge cases. While advanced driver assistance systems can handle many scenarios, full autonomy demands near-perfect reliability across millions of miles, a much harder problem than initial prototyping or driver-assisted systems.
Dylan Field believes that while conversational and agent-based interfaces will grow, traditional UIs will not disappear. Instead, new modalities like voice and intelligent cameras will complement existing interfaces. He predicts that UI will become more sophisticated, and users will interact with AI through a mix of methods rather than relying solely on one type of interface.
Alexandr Wang compares the path to AGI to curing cancer, where solving many small, independent problems is necessary rather than achieving a single breakthrough. He believes there is limited generalization across modalities and that each niche capability will require separate data flywheels. This approach suggests a slow, incremental progress toward AGI rather than a sudden leap.
Hi, NoPriors listeners. I hope it's been an amazing 2024 for you all. Looking back on this year, we wanted to bring you highlights from some of our favorite conversations. First up, we have a clip with the one and only Jensen Huang, CEO of NVIDIA, the company powering the AI revolution.
Since our 2023 NoPriors chat with Jensen, NVIDIA's tripled in stock price, adding almost $100 billion of value each month of 2024 and entering the $3 trillion club. More recently, Jensen shared his perspective again with us, this time on why NVIDIA is no longer a chip company, but a data center ecosystem.
Here's our conversation with Jensen. NVIDIA has moved into larger and larger, let's say, like unit of support for customers. I think about it going from single chip to, you know, server to rack and BL72. How do you think about that progression? Like what's next? Like could NVIDIA do a full data center? In fact, we'd build full data centers. The way that we build everything, unless you're building...
If you're developing software, you need the computer in its full manifestation. We don't build PowerPoint slides and ship the chips. We build a whole data center. Until we get the whole data center built up, how do you know the software works? Until you get the whole data center built up, how do you know your fabric works and all the things that you expected the efficiencies to be, how do you know it's going to really work at scale?
And that's the reason why it's not unusual to see somebody's actual performance be dramatically lower than their peak performance as shown in PowerPoint slides. And it's...
Computing is just not what it used to be. I say that the new unit of computing is the data center. That's to us. So that's what you have to deliver. That's what we built. Now, we build a whole thing like that. And then for every single thing, every combination, air-cooled, x86, liquid-cooled, Grace, Ethernet, InfiniBand, NVLink, no NVLink, you know what I'm saying? We build every single configuration. We have five supercomputers in our company today.
Next year, we're going to build easily five more. So if you're serious about software, you build your own computers. If you're serious about software, then you're going to build your whole computer. And we build it all at scale. This is the part that is really interesting. We build it at scale and we build it vertically integrated. We optimize it full stack and then we disaggregate everything and we sell it in parts.
That's the part that is completely, utterly remarkable about what we do. The complexity of that is just insane. And the reason for that is we want to be able to graft our infrastructure into GCP, AWS, Azure, OCI. All of their control planes, security planes are all different. And all of the way they think about their cluster sizing, all different.
But yet we make it possible for them to all accommodate NVIDIA's architecture so that CUDA could be everywhere. That's really in the end the singular thought, that we would like to have a computing platform that developers could use that's largely consistent, modulo 10% here and there because people's infrastructure are slightly optimized differently and modulo 10% here and there, but everything they build will run everywhere.
This is one of the principles of software that should never be given up, and we protect it quite dearly. It makes it possible for our software engineers to build once, run everywhere. That's because we recognize that the investment of software is the most expensive investment. It's easy to test. Look at the size of the whole hardware industry.
And then look at the size of the world's industries. It's $100 trillion on top of this $1 trillion industry. And that tells you something. The software that you build, you have to basically maintain for as long as you shall live. We, of course, have to mention our conversation with the lovely Andrej Karpathy, where we dig into the future of AI as an exocortex, an extension of human cognition.
Andrei, who's been a key figure in AI development from OpenAI to Tesla to the education of us all, shares a provocative perspective on ownership and access to AI models and also makes a case for why future models might be much smaller than we think. If we're talking about a exocortex, that feels like a pretty...
fundamentally important thing to democratize access to. How do you think like the current market structure of what's happening in LLM research, you know, there's a small number of large labs that actually have a shot at the next generation progressing training. Like how does that translate to what people have access to in the future?
So what you're kind of alluding to maybe is the state of the ecosystem, right? So we have kind of like an oligopoly of a few closed platforms and then we have an open platform that is kind of like behind. So like Metalama, etc.
And this is kind of like mirroring the open source kind of ecosystem. I do think that when this stuff starts to, when we start to think of it as like an exocortex, so there's a saying in crypto, which is like, not your keys, not your tokens. Not your, yeah. Like, is it the case that if it's like not your weights, not your brain? That's interesting because a company is effectively controlling your exocortex and therefore part of your... Yeah, it starts to feel kind of invasive. If this is my exocortex... I think people care much more about ownership, yes. Like you're, yeah, you realize you're renting your brain.
Like, it seems strange to rent your brain. The thought experiment is like, are you willing to give up ownership and control to rent a better brain? Because I am. Yeah. So I think that's the trade-off, I think. We'll see how that works. But maybe it's possible to like...
by default use the closed versions because they're amazing, but you have a fallback in various scenarios. And I think that's kind of like the way things are shaping up today even, right? Like when APIs go down to some of the closed source providers, people start to implement fallbacks to like the open ecosystems, for example, that they fully control and they're in, they feel empowered by that, right? So maybe that's just the extension of what it will look like for the brain is you fall back on the open source stuff
should anything happen. But most of the time, you actually... So it's quite important that the open source stuff continues to progress. I think so, 100%. And this is not like an obvious point or something that people maybe agree on right now, but I think 100%. I guess one thing I've been wondering about a little bit is...
What is the smallest performant model that you can get to in some sense, either in parameter size or however you want to think about it? And so I'm a little bit curious about your view because you've thought a lot about both distillation, small models, you know. I think it can be surprisingly small.
And I do think that the current models are wasting a ton of capacity remembering stuff that doesn't matter. Like they remember SHA hashes. They remember like the ancient. Because the data set is not curated the best. Yeah, exactly. And I think this will go away. And I think we just need to get to the cognitive core. And I think the cognitive core can be extremely small.
and it's just this thing that thinks. And if it needs to look up information, it knows how to use different tools. Is that like 3 billion parameters? Is that 20 billion parameters? I think even a billion. Billion surprises. We'll probably get to that point. And the models can be very, very small. And I think the reason they can be very small is fundamentally, I think, just like distillation works. It may be like the only thing I would say. Distillation works like surprisingly well. Distillation is where you get a really big model or a huge amount of computers or something like that.
supervising a very small model. Our conversation with Brett Taylor, OpenAI board member and founder of Sierra, painted a really different picture of how we interact with businesses in the future. Here's a clip of Brett explaining company agents and why the website is going to take a backseat. The other category, which is the area that my company, Sierra, works in, is what I call company agents. And it's really less simply about automation or autonomy, but
in this world of conversational AI, how does your company exist digitally? I always use the metaphor that we're 1995. If you existed digitally, it meant having a website and being in Yahoo directory, right? In 2025, existing digitally will probably mean having a branded AI agent that your customers can interact with to do everything that they can do on your website. Whether it's asking about your products and services, doing commerce, doing customer service,
That domain, I think, is shovel-ready right now with current technology because, again, like the persona-based agents, it's not boiling the proverbial ocean, technically. You have well-defined processes for your customer experience, well-defined systems that are your systems of record. And it's really about saying, in this world where
We've gone from websites to apps to now conversational experiences. What is the conversational experience you want around your brand? And it doesn't mean it's perfect or it's easy. Otherwise, we wouldn't have started a company around it, but it's least well-defined. And I think that right now in AI, if you're working on artificial general intelligence, your version of agent probably means something different, and that's okay. That's just a different problem to be solved.
But I think, you know, particularly in the areas that Sierra works and a lot of the companies that you all have invested in is saying, you know, are there some shovel ready opportunities right now with existing technology? And I absolutely think there are. Can you describe the like shoveling cycle of building a company agent? Like what is the gap between research and reality? Like how do you what do you invest in as an engineering team? Like how do you understand the scope of different customer environments? Just like what are the
sort of vectors of investment here. And maybe, sorry to interrupt, as a starting point, it may even be worth also defining like what are the products that Sierra provides today for its customers? And then where do you want that to go? And then maybe we can feed that back into like, what are the components of that? Because I think obviously folks are really emerging as a leader in your vertical, but it'd be great just for a broader audience to understand what you focus on. Yeah, sure. I'll just give a couple examples to make it concrete. So if you buy a new Sonos speaker or you're having tech
technical issues with your speaker, you get the dreaded flashing orange light. You'll now chat with the Sonos AI, which is powered by CIRA to help you onboard, help you debug whether it's a hardware issue, Wi-Fi issue, things like that. If you're a SiriusXM subscriber, their AI agent is named Harmony, which I think is a delightful name. It's everything from upgrading and downgrading your subscription level to if you get a trial when you purchase a new vehicle, speaking to you about that.
broadly speaking, I would say we help companies build branded customer-facing agents. And branded is an important part of it. It's part of your brand. It's part of your brand experience. And I think that's really interesting and compelling because I think just like, you know, when I go back to the proverbial 1995, you know, your website was on your business card. It was the first time you had sort of this digital presence. And I think the same novelty and probably we'll look back at the agents today with the same
sense of, oh, that was quaint. You know, I remember if you go back to the Wayback Machine, you look at early websites, it was either someone's phone number and that's it. Or it looked like a DVD intro screen with like lots of graphics. You know, a lot of the agents that customers start with are often around areas of customer service, which is a really great use case.
But I do truly believe if you fast forward three or four years, your agent will encompass all that your company does. I've used this example before, but I like it. But just imagine an insurance company, all that you can do when you engage with them. Maybe you're filing a claim. Maybe you're comparing plans. We were talking about our kids earlier. Maybe you're adding your child to your insurance premium when they get old enough to have a driver's license.
all of the above will be done by your agent. So that's what we're helping companies build. Next, we talked to the Sora team at OpenAI, which is building an incredibly realistic video AI generation model. In this clip, we talk about their research and how models that understand the world fit into the road to AGI. Is there anything you can say about how the work you've done with Sora sort of affects the broader research roadmap? Yeah, so I think something here is about
the knowledge that Sora ends up learning about the world just from seeing all this visual data. It understands 3D, which is one cool thing because we haven't trained it to. We didn't explicitly bake 3D information into it whatsoever. We just trained it on video data and it learned about 3D because 3D exists in those videos. And it learned that when you take a bite out of a hamburger that you leave a bite mark. So it's learning so much about our world and
When we interact with the world, so much of it is visual. So much of what we see and learn throughout our lives is visual information. So we really think that just in terms of intelligence, in terms of leading toward AI models that are more intelligent, that better understand the world like we do, this will actually be really important for them to have this grounding of like, hey, this is the world that we live in. There's so much complexity in it. There's so much about how people interact.
how things happen, how events in the past end up impacting events in the future, that this will actually lead to just much more intelligent AI models more broadly than even generating videos. It's almost like you invented like the future visual cortex plus some part of the...
reasoning parts of the brain or something, sort of simultaneously. Yeah. And that's a cool comparison because a lot of the intelligence that humans have is actually about world modeling, right? All the time when we're thinking about how we're going to do things, we're playing out scenarios in our head. We have dreams where we're playing out scenarios in our head. We're thinking in advance of doing things. If I did this, this thing would happen. If I did this other thing, what would happen, right? So we have a world model that
and building Sora as a world model is very similar to a big part of the intelligence that humans have.
How do you guys think about the sort of analogy to humans as having a very approximate world model versus something that is as accurate as, like, let's say, a physics engine in the traditional sense, right? Because if I, you know, hold an apple and I drop it, I expect it to fall at a certain rate. But most humans do not think of that as articulating a path with a speed as a calculation. Do you think that sort of learning is, like, parallel in large models? Yeah.
I think it's a really interesting observation.
I think how we think about things is that it's almost like a deficiency, you know, in humans that it's not so high fidelity. So, you know, the fact that we actually can't do very accurate long-term prediction when you get down to a really narrow set of physics is something that we can improve upon with some of these systems. And so we're optimistic that Sora will, you know, supersede that kind of capability and will, you know, in the long run, enable it to be more intelligent one day than humans as world models. Yeah.
But it is certainly an existence proof that it's not necessary for other types of intelligence. Regardless of that, it's still something that Sora and models in the future will be able to improve upon. Okay, so it's very clear that the trajectory prediction for throwing a football is going to be better than the next, next versions of these models than mine is, let's say. If I could add something to that, this relates to the paradigm of scale and
the bitter lesson a bit about how we want methods that as you increase compute get better and better. And something that works really well in this paradigm is doing the simple but challenging task of just predicting data. And you can try coming up with more complicated tasks. For example, something that
doesn't use video explicitly, but is maybe in some space that simulates approximate things or something. But all this complexity actually isn't beneficial when it comes to the scaling laws of how methods improve as you increase scale. And what works really well as you increase scale is just predict data. And that's what we do with text. We just predict
predict text. And that's exactly what we're doing with visual data with Sora, which is we're not making some complicated trying to figure out some new thing to optimize. We're saying, hey, the best way to learn intelligence in a scalable matter is to just predict data.
That makes sense. And relating to what you said, Bill, like predictions will just get much better with no necessary limit that approximates humans. We also sat down with Dmitry Dolgov, co-CEO of Waymo. Today, the company is scaling its self-driving fleet, completing over 100,000 fully autonomous rides per week in cities like San Francisco and Phoenix. It's my favorite way to travel.
In this trip, Dimitri explains why achieving full autonomy, removing the driver entirely, and achieving 100% accuracy rather than 99.99% accuracy in self-driving is much harder than it might appear. Why is it breaking from, like, you know, let's say advanced driver assistance that seems to work in more and more scenarios versus, let's say, full autonomy? What's the delta? Yeah. It's the number of nines.
And it's the nature of this problem, right? If you think about where we started in 2009, one of our first milestones, one of the goals that we set for ourselves was to drive 10 routes. Each one was 100 miles long, all over the Bay Area. Freeways, downtown San Francisco, around Lake Tahoe, everything. And you had to do 100 miles with no intervention. So the car had to drive autonomous from beginning to end. That's the goal that we created for ourselves.
It was about a dozen of us, took us maybe 18 months, which we've done. 2009, no ImageNet, no Cognizant, no Transformers, no big models, tiny computers, you know, right? Very easy to get started. It's always been the property. And with every wave of technology, it's been very easy to get started.
But the hard problem, and it's kind of like that early part of the curve has been getting steeper and steeper, but that's not where the complexity is. The complexity is in the long tail of the many, many, many nines. And you don't see that if you go for a prototype, if you go for a driver-assisted system, and this is where we've been spending all of our, that's the only hard part of the problem. And I guess nowadays, it's always been getting easier with every technical cycle. So nowadays, you can take with all of the advances,
of an AI and especially in the generative AI world and the LLMs and BLMs, you can take kind of an almost off the shelf, you know, transformers are amazing. VLMs are amazing. You can take kind of a VLM that can accept images or video and is, you know, has a decoder where you can give it text prompts and output text.
And you can fine-tune it with just a little bit of data to go from, let's say, camera data on a car to instead of words, to trajectories or whatever decisions you make. Just take the thing because a black box, you take whatever's been trained for a living, you fine-tune it a little bit. And like that, I think if you ask any good great student in computer science to build an AV today, this is what they would do. And out of the box, you get something that
It's amazing, right? The power of transformers, the power of realism is mind-blowing, right? So with just a little bit of effort, you get something on the road and it works. You can drive, I don't know, tens, hundreds of miles and it will blow your mind.
But then is that enough? Is that enough to remove the driver and drive millions of miles and have a safety record that is done really better than humans? No, right? I guess this is with every tech evolution technology and a breakthrough in AI, they've seen that. Appreciate it. Up next, we have my dear friend, Dylan Field, CEO of Figma. Dylan shares his prediction for how user interfaces will evolve in an AI-driven world.
While many predict a shift toward conversational or agent-based interfaces, Dylan suggests that new interface paradigms will complement existing ones. He also highlights the exciting potential of visual AI and intelligent cameras as the next frontier in input methods. How do you think about the shift in UI in general?
that's going to come with AI. A lot of things are kind of collapsing in the short run into chat interfaces. There's a lot of people talking about a future agentic world, which does away with most UI altogether. And there's just all programmatic stuff happening in the background. How do you think about where UI is going in general right now? I mean, I kind of think this kind of comes back to the rabbit point I was making earlier. Yes, there's a lot of innovation happening in terms of agents. But I think like in terms of the way that we
use UI to interact with agents were just the beginning. And I think that the interfaces will get more sophisticated. But also, even if they don't, I suspect that it's just like any new media type. When it's introduced, it's not like the old media types go away, right? Just because you have TikTok doesn't mean that you no longer watch YouTube. Even if it's true that a new
form of interaction is via chat interfaces, which I'm not even sure I believe. But even if we take that as a prior on the No Priorities podcast, then I think that you still have UI. And actually, I think you have more UI and more software than before. Do you have any predictions in terms of multimodality? Like, do you think there's more need for voice? Like, so, you know, a lot of the debates people have is like, when are you going to use voice versus text versus other types of interfaces?
And, you know, you could imagine arguments in all sorts of directions in terms of, you know, when do you use what and things like that. And a lot of people are not a lot. Some people are suggesting because of the rise of multimodal models, you'll have like more voice input or more things like that because you'll be able to do real time.
sort of smart contextual semantic understanding of like a conversation. And so you have more of a verbal conversational UI versus a text-based UI. And so it kind of changes how you think about design. So I was just curious if you have any thoughts on that, that sort of future looking stuff. There's all sorts of contexts where a voice UI is really important. And I think that it might be that we find that voice UIs are
start to map to more traditional UIs because it's something that like you could obviously do in a more generalized way. But yeah, I mean, personally, I don't want to navigate the information spaces that I interact with every day, all day via voice.
I also don't want to do it in Minority Report style on the Vision Pro exactly either. Maybe with a keyboard and mouse and like an amazing Vision Pro monitor setup or Oculus, like that could be cool, but I don't want to do the Minority Report thing. And so it's interesting. So I think that we get these new glimpses at interaction patterns that are really cool, but
And the natural inclination is to extrapolate and say they're going to be useful for everything. And I think that they have like sort of their role and it doesn't mean that they're going to be ubiquitous across every interaction we have.
Uh, but that's a natural cycle to be in. And I think it's good. Uh, it's healthy to have sort of that almost mania around what can it do? Because if you don't have that, then you don't get to find out. And so I, I, I'm supportive of people exploring as much as possible, uh, cause that's how you kind of progress on HCI and, and figuring out how to use computers and to the fullest potential that that could be possible. Yeah.
One of the things I am really bullish on is, I mean, you just think of it as an input mode or a peripheral, but it's really hard for people to describe things visually. And so the idea of intelligent cameras, even in the most basic sense,
Oh, it worked. It worked. I think that's actually a really fun space to be, as you said, like exploring because I actually think that will be useful. And it's something that every user is capable of, right? Taking pictures, capturing video. And so I think that'll be, I'm pretty bullish on that. To wrap up our favorite moments from 2024, we have Scale CEO, Alexander Wang. In this clip, he shares his bold take on the road to AGI. Alex also dives into why generalization in AI is harder than many think and
and why solving these niche problems and more data in evals is key to advancing the technology. Something you believe about AI that other people don't. My biggest belief here is that the path to AGI is one that looks a lot more like curing cancer than developing a vaccine. And what I mean by that is, I think that the path to build AGI is going to be in, you know,
you're going to have to solve a bunch of small problems that where you don't get that much positive leverage between solving one problem to solving the next problem. And there's just sort of, you know, it's like curing cancer, which is you have to then zoom into each individual cancer and solve them independently. And eventually over a multi-decade timeframe, we're going to look back and realize that we've, we've, you know, built AGI, we've cured cancer, but the path to get there will be this like, you know, quite plodding road of, of solving individual capabilities and building individual sort of
data flywheels to support this end mission. Whereas I think a lot of people in industry paint the path to AGI as like, you know, eventually we'll just, boop, we'll get there. We'll like, you know, we'll solve it in one fell swoop. And I think there's a lot of implications for how you actually think about, you know, the technology arc and
and how society is going to have to deal with it. I think it's actually a pretty bullish case for society adapting the technology because I think it's going to be consistent, slow progress for quite some time, and society will have time to fully sort of acclimate to the technology that develops. When you say solve a problem at a time, if we just pull away from the analogy a little bit, should I think of that as...
of multi-step reasoning is really hard, as Monte Carlo tree search is not the answer that people think it might be. We're just gonna run into scaling walls. What are the dimensions of solving multiple problems? - I think the main thing fundamentally is I think there's very limited generality that we get from these models. And even for multimodality, for example,
My understanding is there's no positive transfer from learning in one modality to other modalities. So like training off of a bunch of video doesn't really help you that much with your text problems and vice versa. And so I think what this means is like each sort of, each niche of capabilities or each area of capability is going to require separate flywheels, data flywheels to be able to push through and drive performance. You don't yet believe in video as basis for world model that helps.
I think that's a great narrative. I don't think there's strong scientific evidence of that yet. Maybe there will be eventually. But I think that this is the, I think the base case, let's say, is one where, you know, there's not that much generalization coming out of the models. And so we actually just need to slowly solve lots and lots of little problems to ultimately result in AGI. Thank you so much for listening in 2024. We've really enjoyed talking to the people reshaping the world for AI.
If you want to more deeply dive into any of the conversations you've heard today, we've linked the full episodes in our description. Please let us know who you want to hear from and what your questions are for next year. Happy holidays. Find us on Twitter at NoPriorsPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.