We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Google DeepMind CTO: Advancing AI Frontier, New Reasoning Methods, Video Generation’s Potential

Google DeepMind CTO: Advancing AI Frontier, New Reasoning Methods, Video Generation’s Potential

2025/5/20
logo of podcast Big Technology Podcast

Big Technology Podcast

AI Deep Dive AI Chapters Transcript
People
K
Koray Kavukcuoglu
Topics
Koray Kavukcuoglu: 我认为在AI模型的发展中,规模化和新技术是同等重要的。在生成式AI模型的研究中,架构和算法与规模同等重要。研究不同架构和算法在规模化下的效果至关重要。数据与算法、架构和计算能力同样关键。推理时间技术对于提升模型推理能力至关重要。我不认为扩大规模令人失望,因为我们一直在有效地推动模型的能力。重要的是进行广泛的研究,并从多个角度考虑扩展。各种模型都在显著改进,整个领域都在进步。Gemini模型在能力和质量上都取得了稳定进展。我们不断推进前沿技术,并在多个研究方向上看到回报,为实现AGI还有更多进展要做。

Deep Dive

Chapters
Koray Kavukcuoglu discusses the importance of scale versus novel techniques in advancing AI models. He highlights that scale is a significant factor, but other elements like data, algorithms, and inference-time techniques are equally crucial. The discussion touches upon the progress made in AI models despite diminishing returns from simply increasing scale.
  • Scale is important but not the only factor in advancing AI models.
  • Novel techniques, data, and inference-time improvements are equally crucial.
  • Diminishing returns from solely increasing scale are acknowledged, but progress continues through diverse research directions.

Shownotes Transcript

Translations:
中文

What's going on in the heart of Google's AI research operation? We'll find out with Google DeepMind's Chief Technology Officer right after this. From LinkedIn News, I'm Leah Smart, host of Every Day Better, an award-winning podcast dedicated to personal development. Join me every week for captivating stories and research to find more fulfillment in your work and personal life. Listen to Every Day Better on the LinkedIn Podcast Network, Apple Podcasts, or wherever you get your podcasts.

From LinkedIn News, I'm Jessi Hempel, host of the Hello Monday podcast. Start your week with the Hello Monday podcast. We'll navigate career pivots. We'll learn where happiness fits in. Listen to Hello Monday with me, Jessi Hempel, on the LinkedIn Podcast Network or wherever you get your podcasts.

Welcome to Big Technology Podcast, a show for cool-headed and nuanced conversation of the tech world and beyond. We have a great show for you today, a bonus show just as Google's I.O. news hits the wire. We have so much to talk about, including what's going on with the company, what it's announced today, but also what is happening in the research effort underlying it all. And we have a great guest for you. Joining us today is Koray Kavitsky.

Kavuk Cholu. He is the Chief Technology Officer of DeepMind. We're going to speak with Koray today, and then tomorrow you'll hear from DeepMind CEO Demis Hassabis. Koray, great to see you. Welcome to the show. Thank you very much. Folks, by the way, if you're watching on video, Koray and I are in two separate conference rooms in Google's, I don't know, it's a pretty cool new building that they have. It's called what, Gradient Wave or something? We call it the Gradient Canopy.

Gradient canopy. Anyway, we're here and I wanted to ask you a question that we've been asking on the show a lot, which is the scale question. Now, Google has a tremendous amount of compute at your disposal. And so you basically have the option. Is it scale that you want to throw at these models or is it new techniques? So let me just ask it to you as plainly as I can. Is scale the star right now or is it a supporting actor in terms of trying to get models to the next step?

It's a good question. I think also the way you framed it, because it is definitely an important factor. The way I like to think about this is it's rare that in any research problem, you would have a dimension that pretty confidently would give you improvements, right? Like, of course, like with maybe diminishing returns, but most of the time with research, it's always like that. So like,

When we think about our research right now, in the case of generative AI models, right? Scale is definitely one of those, but it's one of those things that are equally important with other things. When we are thinking about our architectures, like the architectural elements, the algorithms that we put in there that make up the model, right?

They are as important as the scale. We, of course, analyze and understand, as with scale, how do these different architectures, different algorithms become more and more effective? That's an important part because you know that you are putting more computational capacity

And you want to make sure that you research the kinds of architectures and algorithms that pay off the best under that kind of scaling property, right? But as I said, that's not the only one. Data is really important. I think it is as critical as any other thing. The algorithms, architectures, modules that we put into the system is important. Understanding their properties with data, with more compute, that is as important, right?

And then, of course, inference time techniques is as important as well, right? Because now that you have a particular architecture, a particular model, you can multiply its reasoning capabilities by making sure that you can use that model over and over again through different techniques at inference time.

You know, to me, it's both hopeful and puzzling to hear about all the different techniques to make these models better. And I'll explain that. It's hopeful because it seems like we're definitely going to see a lot of improvement from where the models are today. And the models are already pretty good. The thing that's puzzling to me is the idea with scale was there was effectively limitless potential in making these AI models bigger.

And you said the words, diminishing returns. And we've heard that from you and basically everybody working on this problem. And it's no secret, right, that right now we've been waiting forever for GPT-5. Meta had some problems with Lama. Anthropic has been trying to tell us there's a new Claude Opus model coming out forever. We haven't seen it. So clearly a lot of the research houses, maybe with the exception of Google, are struggling with what you get from when you make the models bigger.

And so I just want to ask you about that. I mean, it seems like it's nice that there are all these techniques. But again, thinking about this one technique that was supposed to have limitless potential, is that a disappointment for the generative AI field overall, if that's not going to be the case?

Yeah, I really don't think about it that way because we have been able to push the capabilities of the models quite effectively, right? I think in a way, the whole scale discussion starts from the scaling laws, right? Like scaling laws explain the performance of the models under both data and compute and number of parameters, right? And like researching all three in combination is the important thing. And when I look at

the kind of progress that we are getting from that general technology, I think it is still improving. What I think is important is to make sure that there's a broad spectrum of research that is going on across the board. And rather than thinking about scaling only in one dimension, there's actually many different ways to think about it. And investing in those

And we can see the returns that I think across the field, really, not just here at Google, but across the field, many different models are improving with quite significant steps. So I think as a field, the progress has been quite stellar. I think it's very exciting. And in Google, we are very excited about the progress that we have been having with Gemini models.

going from 1.5 to 2 to 2.5, I think we had a very steady progress, very steady improvement in the capabilities of models, both in the spectrum of the capabilities that we have, but also at the quality level for each capability as well, right? So I think what I'm

What I'm excited about is we are pushing the frontier all the time and we see returns in many research directions and many different dimensions of research directions. And I'm excited that there is actually, I think there is a lot more progress to do and there's a lot more progress that needs to happen for reaching AGI as well.

We had Jan LeCun on the show a couple of weeks ago. You worked in Jan's lab. Jan emphatically stated there is no way the AI industry is going to reach human level intelligence, which is his term for AGI, just by scaling up LLMs. Do you agree?

Well, I mean, I think that's a hypothesis, right? That might turn out to be true or not. But also, I don't think that there's any research lab that is trying to only do scaling up the LLMs. So, like, I don't know if anyone is actually trying to negate that hypothesis or not. I mean, we are not. From my point of view, we are investing in such a broad spectrum of research that

that I think that is what is necessary. And clearly, I think like many of the researchers that I talked to and me myself, I think that there is a lot more critical elements that needs to be invented, right? So there is critical innovations on our path to AGI that we need to get through. That's why we are still looking at this as a very ambitious research problem.

And I think it is important to keep that kind of critical thinking in mind. With any research problem, you always try to look at multiple different hypotheses, try to look at many different solutions. A research problem this ambitious, like probably the most important problem that we are working in our lifetimes, right? It is the hardest problem maybe we are working on.

as a problem, as a research problem in our work. I think like having that really ambitious research agenda and portfolio and making investments in many different directions is the important thing. From my point of view, what is important is defining where the goal is

that our goal is AGI, our goal is not to build AGI in a particular way. What's important is build the AGI in the right way that is positively impactful,

that is building on it, that we can bring a huge amount of benefits to the world. That's why we are trying to research AGI. That's why we are trying to build AGI. AGI in itself, sometimes it might come across as it's a goal in itself. The goal in itself is the fact that if we do that, then we can hugely benefit all of society, all of the world.

That's the goal. So with that responsibility, of course, you put in not just particular... It's not very important to me if that particular hypothesis is important or not. What is important is we reach that by pursuing a very ambitious research agenda and building a very strong understanding of the field of intelligence.

Okay, so let's get to a little bit of that research agenda. One of the announcements that you're making at IO, which is this week, which just – when this airs, it will just have been made, is that there's a new product called DeepThink that you're releasing, which is relying on reasoning, or as you put it, test time compute. I think I have that right in terms of what the product is going to look like.

How effective has including reasoning in these models been in advancing them? I mean, would you say when you think about all the different techniques that you've discussed so far today, scaling included, what sort of a magnitude improvement are you seeing by using reasoning? And talk a little bit about DeepThink.

Okay, I mean, first of all, DeepThink, it's not a separate product. It is a mode that we are enabling our 2.5 Pro model so that it can spend a lot more time during inference time.

to think, to build hypotheses. And the important thing is to build parallel hypotheses rather than a single chain of thought. It can build parallel ones and then can reason over multiple of those, build a hypothesis, build an understanding over those, and then continue building those parallel chains of thoughts.

But this one thinks a little bit longer than your traditional reasoning model? It will. I mean, in the current setup, yes, it takes longer. And it takes because like understanding those parallel thoughts and building those parallel thoughts, it's all a much more longer process. But like one thing that we are also – that we are also –

positioning it as is right now it's research, right? Like we are sharing some initial research results. We are excited about it. We are excited about the technique that what it enables, what it can actually enable in terms of new capabilities and new performance levels.

But it's early days, and that's why we are only sharing it right now. We are going to start sharing with safety researchers and some trusted testers because we want to also understand the kinds of problems that people want to solve with it and the kinds of new capabilities it brings and how we should train it the way that we want to train, right?

So it is early days on that, but it is what I think is an exciting research direction that we found in the inference time thinking model space. Yeah, so can you talk about what precisely it does different than traditional reasoning models? The current reasoning thinking models, most of the time, at least I can talk from our research point of view, builds a single chain of thought.

Right? And then as you build a single chain of thought and as the model continues to attend to its chain of thought, it builds a better understanding of what response it wants to give you. It can alternate between different hypotheses, reflect on what it has done before.

Now, of course, if you think about it just also in a visual kind of space, one kind of scalability that you can bring onto the table is can you have multiple parallel chains of thoughts so that you can actually analyze different hypotheses in parallel, and then you will have more capacity exploring different kinds of hypotheses, and then you can compare those.

And then you can eliminate the ones or you can continue pursuing and you can sort of expand on particular ones. It's a very intuitive process in a way, but of course it is more involved.

I just want to cap this segment by asking you in terms of the pace of improvement of models. Like I'm just going to use the OpenAI schema just to give an example. The progress, this is something that everybody who comes on this show says, the progress of going from like GPT-3 to GPT-4 was undeniable. GPT-4 to 4.5, less of a leap. So I want to ask you just in terms of the velocity of improvement, if that's the right way to put it.

Are we coming back down to earth a little bit right now? Again, when I look at our model family, right, going from Gemini 1 to 1.5 to 2 to now to 2.5, I'm very excited about the pace that we have.

When I look at the capabilities that we keep adding, right, like we have always designed Gemini models to be multimodal from the beginning, right? Like that was our ambition because we want to build AGI. We want to make sure that we have models that can fulfill the capabilities that we expect from a general intelligence. So multimodality was key from the beginning.

And we have been, as the versions have been progressing, we have been adding that natural multimodality more and more and more. And when I look at the pace of improvement in our reasoning capabilities, like lately we have added the thinking capabilities, and I think with 2.5 Pro, we wanted to make a big leap in our reasoning capabilities, our coding capabilities,

And I think one of the critical things is we are bringing all these together in one single model family. And that is actually one of the catalyzers of improvement and improvement at pace as well. It's harder, but we find that

creating a single model that can understand the world and then you can ask questions about, "Oh, can you code me this sort of like a simulation of a tree growing?" And then it can do it, right? That requires understanding of all of the things, not just how to code because, again, we are trying to bring these models to be useful, to be usable by a very broad audience.

And I think our pace has been really reflective of the research investments that we have been doing across the board. So no velocity slowdown is what I'm hearing from you. Let me just put it in the way that I'm very excited about everything that we have been doing as Gemini progresses and research is getting more and more exciting. Of course, for us folks who are doing research, it is really good.

Okay, so I want to ask you, you know, you're on the model side. I want to ask you, basically, sometimes we debate on the show what the value is of improving models. So let me just like put a thought experiment to you. What do you think the value of improving these models by 10% would get us? The question there is like, how do we define 10%, right? Like that is where the value is defined already, right?

One of the important things about doing research and improving the models is quantifying progress. We use many different ways to quantify progress, and not every one of them is linear, and not every one of them is linear with the same slope.

So when we say by improving 10%, if we can improve 10% by its understanding in math, understanding of really highly complex reasoning problems, I think that is a huge improvement because then that actually expands the general knowledge. That would indicate that the general knowledge and the capabilities of the models have expanded a lot.

And you would expect that that would make the model a lot more applicable to a broader range of problems. And what about if you improved the model by like 50 percent? What would that get you? Is your product team like saying there are things that we can build if this model was just like 50 percent better? Yeah.

Again, I think we work with product teams a lot, right? That's actually taking a step back. That's a quite important thing for me. Thinking about AGI as a goal, I think that also goes through working with the product teams because it is important that when we are building AGI, it's a research problem.

We are doing research, but the most critical thing is we actually understand what kind of problems to solve, what kind of domains to evolve these models from the users. So that user feedback

and that knowledge from the interaction with the users is actually quite critical. So when our products tell us about, okay, here's an area that we want to improve on, then that is actually quite important feedback for us that we can then turn into metrics and pursue those.

As you ask, like, I mean, if as we increase the capabilities of the model across, I think what is important is across a broad range of metrics, which I think we have been seeing in Gemini, as I said, from like 1.5 to 2.5, right? You can see the capability increases here.

Across the model a lot more people can actually use the models in their daily life to help them to either learn something new or to help them solve an issue that they see but that's the goal right like at the end of the day again like the reason we build this technology is to build something that is helpful and

And the products are a critical aspect of how we measure and how we understand what is helpful and what is not. And as we increase more in that, I think that's our main ambition. That's great.

Let's take a concrete example that, again, the company Google is releasing today, talking about today, which is VO3. This is your video generation model. And I think we've really seen an unbelievable acceleration in terms of what these models can do from the first generation to second generation to the third. And for listeners and viewers, what Google is doing now is not only are you able to generate scenes, you're able to generate them with sound.

And having watched one of these videos or a couple of them, I can tell you the sound matches. And then there's this other crazy product that Google's putting out. I think it's called Flow, where you could just extend the scene that you've generated and storyboard out like your own basically short film. So I'd love to hear your perspective on how this happened.

And is this like – I kind of asked you what do we get at 10%, 50%? But is this kind of that perfect example of the model getting better, producing something that goes from – that's a fun little video to like, oh, I can really use this now. I think –

The main difference, the main progress going from VO2 to VO3, from VO1 to VO2, it was a lot more about understanding the physics and the dynamics of the world. With VO2, I think for the first time we could comfortably say that for many, many cases, right, the model has understood the dynamics of the world well. That's very important, right? Like to be able to have a model that can generate data

and complex scenes where there's dynamic environment happening. And also there's interactions of objects happening.

I remember one of the things that was quite viral was like cutting the tomato, where it was so precise, the video generated by VO2, that it looks so realistic that one, like a person was slicing tomatoes and the dynamics there and how both the, like not just any single object, like how the hand moves, but also the interaction between different objects, the blade, the tomato, how the slice falls down and everything.

It was very precise, right? So that interactive element was important. Understanding the dynamics is about not just understanding the dynamics of a particular single object, but it's also multiple objects interacting with each other, which is much, much more complex.

So I think there we had a big jump. With VO3, I think we are doing another jump in that aspect. But I see the sound as an orthogonal, a new capability that is coming in. Of course, our real world, we have multiple senses and vision and sound go hand in hand.

They are perfectly correlated, we perceive them all at the same time, and they complement each other. So to be able to have a model that understands that interactivity, that complementarity, and being able to generate scenes and videos that can generate both at the same time, I think that speaks to the new capability level of the model. And the quality, I think this is the first step.

There are very impressive examples. There are examples that are a little bit more falling short of what you would say, okay, this is really natural. But I think this is an exciting step in terms of expanding that capability. And as you said, I think I'm excited to see how...

this kind of technology can be useful, right? Like you just said that, oh, it is becoming useful. I think that is great to hear, right? Like that, like now this is a technology that can be built. And I think flow is an experiment in that direction to give it to the, to give it to users so that like,

for people to experiment and build something with it. Yeah, you like prompt a scene and then it creates a scene, then you prompt the next scene and you can continue to have a story flow, which is a good name for it. All right, this next question comes to me from a pretty smart AI researcher. They basically talked about how there's this basic, there's a tension between open source and proprietary. And

And, of course, we have companies like Google that's building, you know, obviously attention is all you need. The transformer came from Google. Now Google's building proprietary models. We saw DeepSeek push the state-of-the-art forward, you could argue. So this person wanted to know, and I think it's a really good question, is there a coordination or possible between open source and Google?

proprietary, maybe we see OpenAI doing their new open source model or teasing it, or should each sort of side try to get its own part of the market? What do you think? I think, like, I want to say a couple of things, right? Like, first and foremost, again, like, take a step back. There's a lot of research that went into building this technology, right? Like, of course, like, in the last, like, two, three years,

I think it became so accessible and so general that people are using in their daily lives. But there's a long history of research that built up to this point.

So, as a research lab, Google and before, of course, there was DeepMind and Google Brain, two separate labs that are working in tandem in different aspects. And many of the technologies that we see today has been built as research prototypes, as research ideas, and have been published in papers. As you said, Transformers, the most critical technology that is underlying things.

and then models like AlphaGo, right? AlphaFold, all of these kinds of things, all these research ideas have been evolving into building the knowledge space that we have right now. All that research, I think publications and open sourcing all those have been a critical element because we were really in the exploratory space at those times. Nowadays, I think like the other thing that we always,

need to remember is actually we have at Google, we have our Gemma models, right? That are the open weights models, just like Lama open weights models. We have the Gemma open weights models. The reason to do those for us is also there's a different community of developers and users who want to interact with those models, who actually need that kind of being able to download those weights into their own environment and use that and build with that.

So I feel like it's not an either or. I think there are different kinds of use cases and communities that actually benefit from different kinds of models. But what is most important is at the end of the day, in the path towards AGI,

Of course, it's important that we are being conscious about what we enable with the technologies that we develop. So when we develop our frontier technologies, we choose to develop them under the Gemini umbrella, which are not open weights models, because we want to also make sure that we can be responsible in the way that they are used as well. Right? Right.

But at the end of the day, what really matters is the research that goes into building the technology and doing that research and pushing the frontier of the technology and building it the right way with the positive impact. And I think it can happen both in open-weight ecosystem or in the closed system. But I think when I think about all the...

sort of the umbrella of things that we are trying to do. We have quite ambitious goals, building AGI and doing it the right way with the positive impact. That's how we develop our Gemini models. Okay, I have like 30 seconds left with you. You're Chief Technology Officer. Are you a fan of vibe coding? Yes.

Exactly. I find it really exciting, right? Like, I mean, because like what it does is all of a sudden it enables a lot of people who are not necessarily, who do not necessarily have that coding background to build applications. It's a whole new world that is opening, right? Like you can actually say, oh, I want an application like this. And then you see it.

You can imagine what kinds of things could be possible in the space of learning, right? You want to learn about something, you can have a textual representation, but you can ask the model to build you an application that explains you certain concepts and it would do it, right? And this is the beginning, right? Like some things it does well, some things it doesn't well, it doesn't do well, but I find it really exciting. This is the kinds of things that the technology brings. All of a sudden, like the

Like the whole space of building applications, the whole space of building dynamic, interactive applications becomes accessible to a large, broader community and set of people.

All right. Great to see you. Thank you so much for coming on the show. Yeah. Thank you very much. Thanks for inviting Alex. Definitely. We'll have to do it again in person sometime. All right, everybody. Thank you for listening. We'll have Demis Hassabis on, the CEO of Google DeepMind tomorrow. And so we invite you to join us then. We'll see you next time on Big Technology Podcast.