- Sholto Douglas was a key part of Anthropix's Cloud4 models. It was really fun to sit down with him on the day these models got released. We talked about a bunch of things, including how developers and builders should think about this next generation of Anthropic models. We talked about what the trend line mean for where these models will be in six, 12 months, two, three years from now. We hit on what's required for reliable agents and when these models will get better in domains like medicine and law.
kind of mirror the advances they've already made in coding. And then we hit on his views on alignment research, where we are today, what's working, what still needs to be done, and its reaction to the AI 2027 work. This was just a fascinating conversation with a brilliant mind in LLM research. I think people will really enjoy it. Without further ado, here's Shulta.
Well, thanks so much for coming on the podcast. No, it was a joy. It's a really cool little route. Yeah, no, I appreciate you getting in this little cave with us. It's always fun. By the time this podcast comes out, the world will have Cloud 4. I'm sure people will play around with it, but I'm curious. You're one of the first people to get to play around with these models.
What gets you most excited about them? So they're another step up in software engineering, that's for sure. And Opus is really an incredible software engineering model. More and more, I have these moments where I go and ask it to do something incredibly ill-specified in our large monorepo. And it's able to go and do it in quite autonomous and independent way, like going and discovering the information and figuring this out, running tests.
That blows me away every time. - Every time we get a new set of models, we have to re-characterize a mental model of what works, what doesn't. How has your model now changed of when you're coding, what you use these models for and don't with this lead? - So I think the biggest one is time horizon expands a little.
So I think you can characterize model capability improvements on two axes. One of those is like the absolute intellectual complexity of the task. And the other one is the amount of context or the amount of like successive actions that they're able to like meaningfully reason over and include. And these models feel substantially better along the second axis. Like they're really able to
take multiple actions and figure out what information they need to pull in from their environments and then act on those. So giving it like, it's the time horizon. Also the support that we like, like Cloud Code and this kind of stuff. The fact that
it now has access to all of the tools to be able to do this in a useful way and you aren't sitting there like copy pasting from a chat box is like a pretty meaningful improvement in that regard too. There is a wide variety of tasks where I'm looking at, you know, an hour plus or like many hours of work where I would have done and it's just there churning away in front of me doing these like in terms of human equivalent time. People are going to get these models when this podcast comes out for the first time. What is your advice on the first thing they should try? First thing they should try.
I think, honestly, try and plug them into your work. That's the biggest one, is sit down and ask it to do the same thing, the thing that you were about to do first off in your code base that day. Watch as it figures out what information it needs to pull in and figures out what to do, and I think you'll be pretty impressed.
I mean, now that you have these new capabilities, obviously you have tons of people that build on top of these models. What are you hoping is newly enabled for builders that take these models and build applications? So I think there's this concept of a product exponential in some respects where you have to be constantly building just ahead of the model's capabilities. And I like to think about this in terms of, say, Cursor and Windsurf and Devon and these products.
If you look at Cursor, they had a vision for what coding would be that was substantially ahead of where the model capabilities were for a while. Cursor didn't hit PMF until the underlying models like Cloud 3.5 Sonnet took off such that the assistance that they wanted to give people was able to be realized. And then Windsurf,
went, I would say, substantially more agentic. And that enabled them to get a reasonable slice of market share by really pressing harder on that product exponential. What we're starting to see now with Cloud Code, but also with the new Cloud GitHub integration and with OpenAI's Codex, and also Google's coding agents. Everyone's really into coding agents.
Jules, right? Jules, right. Is people building for another level of sort of autonomy and asynchronicity? And so right now, the models are taking these stumbling steps towards being able to do tasks independently of you, the kind of tasks that would have taken you several hours before. What that looks like next, I think, is...
There's this interesting transferral of you are in the loop every second to you are in the loop every minute to you're in the loop every hour that we've seen over the course of the last year. And I wonder if it doesn't look like you're managing a fleet of models in future. And so I think that kind of interface would be very interesting to explore. Just how much parallelism can you give someone when it's not a single model they're managing, but multiple models doing multiple things and interacting with each other? I think that would be pretty exciting. Yeah. Have you seen it? What might that look like?
What might that look like? Oh, God. I mean, I know a lot of people actually at Anthropic who have multiple cloud code instances up in different dev boxes, which is pretty cool. But I think no one's really cracked that form factor yet. I think that's an interesting form factor to explore of what is the almost management bandwidth of an individual. I think this is also an interesting question to explore from the future of how does economics even work or what are the return on productivity of these models because
If you think like we initially will need humans to verify the outputs of these models. And so the economic impact of the models will be like at some initial point bottlenecked by human management banners until you get to a point where you can like delegate the trust in a model to itself manage teams of models. And so that like continual step up in hierarchy of like abstraction layers will be, I think,
one of the more important trend lines. - Yeah, so basically you have, based on the frequency with which you need to check these models, you become a gating factor. You have an infinite number of models running, and if you have to check them every 15 minutes versus every hour versus every five hours, you can do a lot more. - Yeah, exactly. I think Jensen mentioned this with respect to how he felt about the future of AGI and progress and this kind of thing. He said, well, actually, I am surrounded by 100,000 incredibly intelligent AGIs.
And he's like, and so like, this gives me huge leverage over the world. And like, that's sort of the impact. I mean, you know, he's describing how like he himself is like this gating factor in like managing the company of NVIDIA. I think a lot of work ends up looking close to that direction. Yeah, who knows? I mean, maybe this like, you know, this whole field of org design ends up being actually the most important. Right, right. Exactly. And how do you like...
and like trust and, and yeah, exactly. The org structure becomes, it becomes complicated. Yeah. I know what you're saying before the, uh, the episode that you did spend a year at McKinsey before, and I thought, Hey, maybe this is a good use case for the consulting firms, you know, who have been doing years of this, uh, maybe it's a good new product line for them. Yeah. Actually, I was really struck by this, you know, what you just said about how basically, um, you know, for the app companies, it's about being, you know, a stage ahead of like where the models are going and, you know, the models change so quickly that it's like almost, if you think about like, uh,
what Cursor did versus the agentic coding companies like Cognition to maybe someone thinking through now what is the dashboard that you use to manage your hundred team. What is the right level of ahead in your mind to go? Because you may feel like, "Oh God, I'm really out of my skis today." Then in three months, you'll be like, "Actually, I'm way behind where the model capabilities are." You have to constantly reinvent the product to be suitable for the frontier of model capabilities.
A few months ahead, maybe. I think this is sensible. So you still maintain a lot of contact with direct users and this kind of stuff. And the product works to some degree, but then it allows you to take advantage of the frontier capabilities. - I feel like that's the risk because while you're waiting for the models to get somewhere, someone else is taking up your customer-- - Right, right. - The developer love and your customer base, and they can probably integrate in some of the stuff as it is. - Right, exactly. And you saw that also with Custer and Windsurf, this kind of thing, right?
There's a lot of things in these models that you guys made progress on, like memory, instruction following, tool use. So I guess, again, kind of recontextualizing for folks, like where are we in these three areas? What works? What doesn't? Yes. Okay. So a good way to think about
what's happened with these models over the last year is because RL finally really worked on top of language models, there's, I think, no direct ceiling to the intellectual complexity that we've been able to teach these, like the intellectual complexity of tasks with which we've been able to teach these models. So you see them doing incredibly complex math problems, incredibly complex coding problems. But those things are in scoped domains where it's like we're at relatively limited context. The problem's there in front of the model.
Things like memory and tool use, these are attempts to expand the context within which the model is able to act and the affordances it has. So things like MCP allow it to suddenly the world opens up to it and it's able to interact with the outside world. Memory allows it to run with much longer context, much greater degrees of personalization than just a raw model with its own context window. And so I think
Those efforts represent attempts to crack agency by giving the model all these un-Hobblings in one respect. And I think the Pokemon eval is a quite a good... I love that eval. It's great. As an avid Game Boy player back in the day, I feel like it's a great one. I hope you're going to release that alongside this model. Yeah, yeah. The new model has been playing Pokemon, so you'll see that.
I think it's a great eval because it hasn't been trained for. And so it demonstrates this generalizability of intelligence. I mean, a task which is not completely out of distribution, but one which is meaningfully different from anything it's done before. And another example is... I'd buy a strategy guide to beat that game.
I remember. There's a lot of ladders and places to go around. Exactly. Another example of this that I really like is there's been a recent interpretability agent that Anthropic's been working on. Basically what this does is it does that job of finding circuits in language models. And this is really, really cool because we haven't trained for it to do this. We've trained for it to
It's a coding agent, but it's able to mix that with its knowledge of theory of mind and this kind of thing to sit there and itself talk to the model that it's trying to understand, try and reason through what kind of, it has access to tools which are like looking at visualizing neurons and circuits and this kind of thing. It is actually able to win this interesting,
like alignment safety eval, which is called the auditing game, where you twist the model in some way and it has to figure out what is wrong with the model. And it is able to do that. It's able to talk to the model, generate its own hypotheses about what might be wrong with the model and look at all these tools. I think it's just such a brilliant demonstration of...
of the generalizable competence of these models with access to tools and memory. Totally. I feel like Builder's been waiting for agents and the ability to use this stuff reliably. I think you've talked before on podcasts about the varied agents' reliability, right? Yes. How much progress have we made there for the builders that are listeners here?
Hey guys, this is Rashad. I'm the producer of Unsupervised Learning. And I just wanted to take a quick break from the conversation. You probably know what I'm going to say. If you haven't already rated the show, ratings are maybe the number one way that you can support the show and help us grow as well as sharing the episode with someone that you think would find value out of it. The way you rate the show is by going to the homepage, hit the three dots and then hit rate show on Spotify. And if you haven't already rated the show,
Again, super helpful in helping us grow and continue to bring on the best guests like Sholto. So thank you so much for listening. And now back to the episode.
And I think we're making a hell of a lot of progress. We're not 100% there on reliability. These models don't succeed all the time. There's still a meaningful gap between the performance of the model when you ask it to do something once versus when you ask it to try 256 times. There's many evals where you can completely solve them with many attempts, but on the first time, it's not guaranteed. But that being said, I think every trend line I'm seeing says that we are on track to get success.
expert superhuman reliability at most things that we train on. Yeah. What would change your mind on that? I think if we were to basically fall off trend line. So if, let's say, by the middle of next year, you started to see some kind of
a block on the time horizon with which these models are capable of acting. I think you should look at that like coding is always the leading indicator in AI. So I think you would see that drop off in coding first. But that would be perhaps reflective of
inherent limitations in the algorithm, which I really strongly believe there aren't. There are other limitations where the task distribution might be harder than you think because there's less data available for something and it turns out to be actually quite a laborious process. So maybe if you think about the computer using agents, then this would be an example where that kind of data doesn't natively exist. But at the same time also, we're seeing just such incredible progress there that it feels to me like relatively
I don't think that that's the world we're in at all. When do you think I'll have one of these general purpose agents that can fill out all my forms for me and navigate the internet for me? Yeah. One thing I joke about is personal admin escape velocity. How can I put off doing a task? Exactly. As a procrastinator, that would be wonderful. Right, exactly. It depends. I still think there's a meaningful...
This depends a bit on whether or not a company focuses on putting at least something, like giving the model some practice reps. If you took a human off the street and you were like, you're a general intelligence, but I'm going to ask you to do my accounting and you're not going to make any mistakes. Probably the person you pull off the street makes some mistakes. But if they've done something similar to it or they're a great mathematician or something like this, then probably they will make many, or they're a lawyer or whatever. If there's something for them to generalize from and map,
to, then they'll be able to do it with a much higher degree of likelihood. So strong depend on what task. By the end of next year, I think we should see like it should be very obvious that this is near guaranteed. Even by the end of this year, that should be pretty clear. But by the end of next year, you'll have these things going around, doing a lot of things for you in your browsers. It'll be, yeah.
Sounds great to me. Your models are really good at coding. What makes them so uniquely good at coding? Is it a prioritization internally? I think people associate Anthropic as the coding model company these days. What is behind that? Anthropic does care a lot about prioritizing the things we think are important. And we believe coding is extremely important. Because coding is the...
that first step in which you will see AI research itself being accelerated. And so we care a lot about coding. We care a lot about measuring progress on coding. We think it's the most important leading indicator of model capabilities. Yeah, I think it's a focus. Are these agents accelerating AI research today? It accelerates me a lot.
Yes, basically yes. They accelerate the engineering a lot. I think it's interesting to ask even people who are utterly brilliant engineers how much this is accelerating. A lot of my friends who I would regard as the strongest people I've ever worked with, they say it's like 1.5x even on domains they know well. And on domains which they don't know well, it's like 5x. So if it's new programming languages or something you haven't done for a while, it's an incredible accelerant. Now, it's
One very important factor to consider on how much will AI accelerate AI progress is how much you believe that we are compute bound or not compute bounded. Whether you think that
if you deploy AI agents who can do the research for you, will that mean that you get gains proportional to the amount of researchers that you're now deploying? At this stage, I imagine it's like most of these things can do the annoying parts of your job so you can think about the brilliant pieces of research to let go test. I mean, do you find...
How do you go to the timeline for where these agents themselves are proposing interesting research directions? I mean, a lot of the work is engineering work. I would say the majority of the work is engineering work at this point in time. When they're proposing novel ideas,
I'm not sure, to be honest. Within the next two years, I think people are already starting to see interesting scientific proposals and this kind of stuff. I think also if you allow... An important thing to consider in the current space of algorithms, these models, is they can become truly expert at something provided they've had a feedback loop for that thing. So it needs to have been allowed to practice a little bit in the same way that humans need to. Well, it also needs to be relatively easily verifiable, right? Are we going to get these models that are like,
unbelievable coders and haven't made the slightest progress in some of these more nebulous fields. Yeah. One point is that ML research is actually incredibly verifiable. Did the loss go down? So if you can get to the point where you can make meaningful proposals for ML research, you have the best RL task in the world. Even more so, I'd say, than general software engineering in some respects.
Will we get progress on less verifiable domains? I'm very confident that we will. I think one interesting data point here is opening eyes recent paper on, it was like questions to medical questions. Yeah. Did you notice how it was eval or how it was scored?
MARK MANDEL: But they had the new medical evals that they put forward. MARK MIRCHANDANI: Yeah, the new ones, right. And they had greater feedback. So they had all these questions where it was like the kind of long form answer that you'd have in an exam. And they gave points for it. And so this is taking a domain which is not inherently verifiable in the same way code or math is and converting it into something which is much more verifiable. I think this is reasonably likely to get solved. I think there's a reason to already be solved, basically.
near guaranteed to get solved eventually. When's eventually? When will we have a really good medical or law? Or I guess, is that just become part of the broader model? Oh, within the next year. Yeah. Does it become part of the broader model or do you think there's like, "Oh, this is the legal specific one or medical specific one?" I'm a bit of a large model maxi in this respect. I think most researchers are. Most researchers are. Yeah, exactly.
I do think there's a lot of really interesting ways in which personalization of models matters a lot, right? Like you want something that understands your company, understands the things you care about and understands you yourself. And so there's a lot of ways in which tuning models for your things does matter. But I think this won't be industry specific things so much as company or individual specific things. And I think Anthropic has a partnership with Databricks where we're doing company specific stuff. But yeah,
at a base level of capabilities, I firmly believe that it's single raw large models. I think this for a number of reasons. One, it's the trend that we've seen so far. But two, because there's no reason in the long run for the distinction between small and large models to exist. You should be able to adaptively use the right amount of work, so to speak, the right amount of flops for the difficulty of a given task. And so I think that
means that bias is towards larger models. - It seems like you're pretty convinced in the continued improvement of these models. - Yes. - I think a lot of people speculate it's like, okay, the models will keep getting better, and then like, how does that diffuse into society? And I guess one thing people like to talk about is basically impact on GDP, right? And like, you know, in the next few years, like, what impact on world GDP do you think these models have? - Yeah, okay.
I think probably the initial impact looks something like China as an emergence. Because there's going to be like... Which is the thing that has probably most impacted world GDP in the last 100 years. You look at Shanghai over the course of 20 years and it dramatically transforms...
but and this will be like dramatically faster than that but you'll see that but there's important distinctions to be made here one is that i think we're near guaranteed at this point to have effectively models that are capable of automating any white collar job um by like 27 28 and or near guaranteed end of decade that being said that's because we have uh like those are those tasks are quite
to our current suite of algorithms. You can try things on computers many times. There's a wealth of data available for this. The internet exists, but that same resource of data doesn't exist for, say, robotics or for biology. And so for a model to be a superhuman coder, you just need the affordances which we've already been able to give the models. And you need to take the existing algorithms and scale it up.
For a model to be a superhuman biological researcher, you need automated laboratories where it's able to propose and run experiments in a hugely paralyzable way. Or for it to become as competent in the real world as we are, you need it to be able to act in the environment through robotics. And so you need a hell of a lot of robots to actually collect the data and do that in a way that eliminates. So one mismatch that I think we might see-- and I'm actually also worried about seeing-- is you'll see a huge impact on white-collar work
And whether that looks like just dramatic augmentation, you know, like TBD, but you will see that will change a lot. And we'll need to pull forward the dramatic transformation of things that make our lives a hell of a lot better. So to pull forward medicine, to pull forward abundance in the real world, we need to...
figure out the cloud laboratories and the robotics and this kind of stuff. But by that time, we'll have millions of AI researchers like proposed experiments. They don't need such a large scale of robotics or biological data. So AI progress goes really fast. But we need to make sure that we pull in the feedback loop to the real world to actually deliver on.
on meaningfully changing world GP and this kind of stuff. Yeah. So you basically think for each white-collar profession, you'll be able to build some sort of reward model similar to how it's done in the South Korea VALS. And I think what's always surprising is how limited data you need to actually build those things in the same way that a human learns to do this on relatively limited data. Right, exactly. And even, I think...
What we conclusively demonstrated is that we can teach models on, like, so far, we haven't yet hit an intellectual ceiling on the tasks which we're able to teach the models. Now, they do seem somewhat less sample efficient than humans. That's also okay. Because we can run thousands of copies of them in parallel, and they can be interacting with different variations of tasks. They can have lifetimes of experience. And so it's okay if they're less sample efficient.
Yeah. Because you still get like expert human reliability and performance at that task. It seems like you think this like, you know, this paradigm kind of gets us, you know, pretty much all the way there. Yeah. You know, obviously you have folks like Ilya who have been saying, look, you know, there needs to be some sort of other algorithmic breakthrough. What's the other side here? Yeah, makes sense. I think most people in the field currently believe that are like
the pre-training plus RL paradigms which we've explored so far are themselves sufficient to reach AGI. We haven't seen the trend lines bending yet, it works, this combination of things.
Whether there are other mountains to climb that could get us there faster, it's entirely possible. I mean, Ilya's invented, like, maybe both of these paradigms before. So who am I to bet against him, right? Every piece of evidence I see says that these are sufficient. You know, maybe Ilya is betting that way because he wants to, you know,
you know like he doesn't have the capital like as much capital available or he thinks that this is a better way to do it um entirely possible like i'm not going to bet against julia yeah but but i do think what we have now will get us there the limiting factor on this will be energy compute like when do you think we start to bump up against that i think there's a great table at the end of situational awareness which which details this um where like by the end of the decade we start to revert like really like
dramatic percentages of US energy production. Like you're over 20, I think maybe 20, 28, like 20% of US energy. And so you can't go orders of magnitude more than that without dramatic changes. This is somewhere I think we need to invest more. I think this is one of the important vectors along which governments should act.
Dylan has this wonderful graph of China's energy production versus US energy production. And the US energy production is flat and China's energy production is like this. Like they're just doing a much better job than we are of building out energy. And so, yes, we...
Yeah. I guess in this current wave of model improvement, like what metrics are, I mean, you know, it seems like it's time horizon based metrics, but like what are the things that are worth hill climbing on right now? Like, you know, as you move from four to whatever comes after four. Yeah. I think in general, I've been impressed by internal company evals. There's many companies which like,
have devised their own version of like SweetBench, let's say. And these are quite rigorous and well held out. So I enjoy hill climbing those. I also think really, really complex tests like Frontier Math is really interesting, want to watch over the next year because that represents such a ceiling of intellectual complexity that I think it's interesting. But more and more, I think what matters is
Eval is really hard to produce. If we could produce evals which meaningfully capture the time horizons of people's work days, I think that would be the best thing to produce. But no one has gone out there and produced that in public.
This is another thing which I think governments should do because I think understanding what the trend line looks like is such an important input into policy. And this is also something which governments are well placed to do where they should be producing like, what does the inputs and outputs look like of an hour or a day of a lawyer or an engineer's daily work day?
And can I convert that into something that's gradable and so that we can actually measure progress against it? I guess on the set of problems that you have to overcome as a foundation model company, where does having good evals rank on the list? Yeah. I mean, every foundation model company has a really big evals team full of great people working incredibly hard to do this. I mean, I think the core...
the core algorithmic and infrastructure challenges of even training the thing. But without good evals, you don't know what your progress is at all. And it's hard to keep external evals fully held out. So it's important to have good internal evals that you trust. MARK MANDEL: But also, I'm struck by having people building applications on top of your models that are willing to share the way they think about evals. It's incredibly helpful to--
Exactly. Because obviously, especially as you get into a lot of these different verticals that you might want to improve on, it's hard for you guys to figure out what is the specific thing in logistics or legal or accounting or whatever it is. And requires such expertise and taste. I think that's another one of the stories of the last couple of years is that you went from...
outputs being, you know, you could put anyone off the street and say, hey, which output do you prefer? And it would meaningfully improve the model to needing like grad students or experts in their field to be able to improve the outputs of the models. I mean, if you put me, put, you know, some field that I don't know very well, like biology or whatever, and put two model outputs in front of me, I would struggle on a
on a lot of them. Like, I wouldn't have the expertise to know which one is a better answer. I guess this idea of taste, I mean, I'm struck by like, you know, the way, obviously, you've seen memory now put into a lot of the way that consumers interact with these models. But it seems like part of the reasons different AI products seem to have taken off is like they find some, like, they struck a chord in like the zeitgeist of like the way that they're, you know, I guess you guys had this with like your Golden Gate like example and there's been tons of other things like this.
What does this look like in the future in terms of model customization, I guess for the vibe of the end users? I think actually there's a weird sort of future where these models end up being one of your most intelligent and charismatic friends. I don't know about your friends, but they're already pretty close. And so I hope that, and I think almost none of our models are like,
they're decent along these axes, but I know many people who spend a lot of hours talking to Claude, actually. But I think there's so much further that we could go. And I think we haven't, we've explored 1% of the depth of personalization and understanding of the model could have of you. How do you get better at that? Is that people
people that had just like exceptional taste like being opinionated in the way that they're like steering these models or how would you even go about solving that? I mean, I think a large part of that reason why Claudia is so good in that way is Amanda and like her taste. And I think similar to like beautiful products, an important part of that like is singular taste. And yeah,
We've all seen the perils of A/B feedback mechanisms and thumbs up, thumbs down, just lead you down a dark path, basically. I think, in part, these models are such wonderful simulators in some respects. They've been asked to model the entire distribution of the internet. So I think one of the ways that this was solved is just by providing an extraordinary amount of context about yourself, the models should actually almost be automatically really good at understanding what you want. And then in designing the personality and this kind of thing,
probably individuals with taste and then like you know your own sort of conversations and feedback with the model um yeah some combination thereof I'm sure you had a bunch of people playing around with these models before they're released like any stories that like you particularly uh particularly resonated I think it's just everything has been a noticeable step up in my like confidence and asking like turning to the model first I suppose to um I have also enjoyed I think uh how like
relentless these models are in some ways. I mean, is that a good word? I don't know. But this is great. We have this great eval where in this eval
The model is meant to fail. It's meant, like, it's, like, something on Photoshop or whatever. And it's, like, not meant to be able to do that thing in Photoshop. And so the model goes, oh, well, I know I can't do this in Photoshop, so I'm going to download, like, this Python, like, library, and I'm going to do it with the Python library and then upload it into the Photoshop thing. And look, hey, I've done it. And so maybe it's not relentless. There's, like, creative and, like, mischievous. Yeah, like something unexpected. Exactly. Like, I thought that story was pretty cute. Yeah.
That is really cool. So, I mean, obviously you've got these new models out today. What are the next like six, 12 months look like? Your best guess. So, the next six to 12 months very much looks like, you know, scaling up RL and sort of exploring where that gets us.
And I think you should expect to see incredibly rapid advances as a result of this. It is in many respects, I think like Dario outlined this in his essay about DeepSeq where he said that like comparatively small amounts of compute have been applied to the RL scaling regime compared to the pre-training regime. And this means that there's like still such huge gains to be made even with existing pools of compute. And the pools of compute are dramatically multiplying this year as well.
So expect to see continual rises in model capability. Basically, by the end of this year, the coding agents, one good metric will be the coding agents that are taking their first halting steps today should be very competent. You will probably feel very confident in delegating substantial amounts of work for hours on end. What's going to be your check-in time, like?
Yeah, exactly. What does the check-in time look like? And at the moment with Cloud Code, sometimes it's five minutes. Sometimes it's like you're sitting there watching it in front of you. By the end of the year, it's probably several hours of confidently doing this for many things. Whereas now, yeah, sometimes models are able to do several hours. Sometimes they're able to do huge amounts of work. But it's spiky.
Yeah. I feel like that's the game-changing thing. I feel like one of the lessons even from RPA is you have to sit there and watch something, do your work. At some point, you're like, I'd rather just do this myself. Yeah. Sometimes, right? Sometimes you step in. And eventually, we'll be able to delegate that. I think someone tweeted a little while ago that the future of software engineering looks like StarCraft. And I think, yeah. When do we get StarCraft level, your APM of coordinating all your pieces? Yeah.
that's probably the end of the year so what does that mean then from like a model release cadence i mean if you guys are scaling this so quickly like does that mean that like you know how like often do you think people you know all the labs end up like shipping new models in this period of rapid adjustment i i would expect to see the model cadence substantially faster than last year um in many ways 2024 was uh was a sort of deep breath in
as people figured out the new paradigms and did a lot of research and sort of like better understood what's going on. And I expect 2025 to feel meaningfully faster. Yeah. Where...
Particularly also because as models get more capable, the set of reward available to them expands in important ways. If you have to give feedback on every single sentence that it outputs, this is not very scalable. But if you're able to allow it to do hours of work in such a way that it
You can just judge, did it complete the thing I wanted? Did it do the right piece of analysis? Did the website work and were people able to message on it and this kind of stuff? It means that basically it should be able to climb these rungs of the ladder ever faster, even though the complexity of the task is increasing. You mentioned earlier, there's like OpenAI Codecs, there's Google Jewels, there's all this different stuff. There's all these startups building out. We're actually launching a GitHub agency. Anywhere on GitHub, you'll be able to say, "Hey, @Claude,
and we'll spin off and do some work for you. - Yeah, so everyone is competing for the hearts and minds of developers. What do you think will determine which tools and models developers use? - I think a big part of this relationship between the companies and developers and how much trust you impart on each other, a big part also is trust and respect between the companies and developers.
I think a large part is also the model capabilities, which ones people are actually comfortable with having and enjoy using, like the personalities and the ability, like the competency of the model and the trust you have in it to go off and do these tasks for you. And I hope also that over time...
as the stock capabilities of these models become more and more and more apparent, the mission of the company as well becomes important. And you think of which companies you're working with as who you're trying to build the future with. I'm not sure, but especially at the cadence of releases keeps going up, it's like every month people will be inundated with like, well, this one climbed on this eval and that one climbed on that eval. I think in an interesting way, this is one of the things people didn't expect about...
like, you know, GPT wrappers, right? Is that one of the benefits of wrapping the model companies is that you can surf the frontier of model capabilities. Oh, 100%. I feel like everyone that tried to not be a wrapper just lit a lot of money on fire. Right, exactly. And so...
surfing that frontier of model capabilities is really wonderful. There is a reverse effect where there are certain things you can only predict if you have access to underlying models, like you can really feel and see the trend lines, or you can only build if you... I think all of the deep research equivalents took some amount of RL in such a way that it was hard to build a deep research equivalent product from outside one of the labs.
Can you just explain that actually? Why is that? Because obviously, increasingly, I think, opening eyes, RFT, I'm sure you guys have some equivalent. It seems like they're opening up to the outside world. I guess there's actually a big question that I think about, a lot of people think about, is what are the labs going to be uniquely good at building? And then what is fair game for anybody? And the labs will try, but the apps won't be in as good a position to do. So I think with the release of RFT APIs, this changes a bit, right? Because there is now benefit to companies specializing in domains.
But then there's also going to be those same centralized benefits. I think, at least my understanding is definitely OpenAI allows people to give some discount, I think, if they can also train on the output model. So there is going to be some centralizing benefit to being the company that has the RFD API and that people are fine-tuning on. And so...
what are the labs going to be uniquely good at i think a very important part here so a couple couple dimensions one is the like main metric that the labs will be judged on is how effectively they are able to convert accelerators and like flops and dollars like capital into intelligence like that is the most important by far metric um and this is the metric that has sort of distinguished uh companies like anthropic companies like opening and deep mine uh from really like the rest of the pack right it's like the models that are trained by these companies are better uh then
The next most important thing after that, I think, will be you're going to have these models that are going to be like employees pretty rapidly. It's going to be the trust and do you like them? And do you trust them to carry out the things that you ask them to do?
So I think that will be an important differentiator. And the personalization will also be an important differentiator. Like how well does the model understand you and your context and your company? I'm sure you have people building like, you know, general purpose agents on top of your models, right? Not being a model company, just being like, we'll take the models off the shelf and we'll do the orchestration. We'll do like really smart chaining. And is that like a doomed task to some extent? You know, even just to articulate, like what is the advantage that the,
model companies themselves will have by just like obviously cost advantage makes total sense versus the api and like you're surrounded by people that know look and know these models deeply well yeah no um i mean
I think this is actually a good thing also, right? Like it encourages an incredible amount of competition and like finding the right form factors and this kind of thing. I think there are some advantages to the model companies. I think, you know, having access to the models and like being able to like, you know, really make sure, I think the RFT APIs don't work brilliantly at the moment. So they're like, so like, it's like this whole process. So being able to tune the models for things you think are important. But I think the, the like waterline is going to keep going up basically of like,
ultimately you are harnessing this intelligence on tap, like an employee that you're hiring or just the raw capability of intelligence. And so, yes, there are going to be companies that wrap and orchestrate these models. And in many cases, they're going to do fantastically well. And I'm not sure actually...
who has the advantage or who doesn't. But the underlying trend is going to stay true. There's this raw intelligence being instilled and made available. And so if a company successfully wraps this API, that's fantastic. It's also going to face a lot of competition. Ultimately, all modes disappear in the sort of T goes to infinity in some ways, because you'll be able to spin up a company on demand, so to speak.
And so I think that's like, it's an interesting and complex future where, where does like value agree? Is it in the customer relationship? Is it in like the ability to orchestrate and like, you know, pull together? Is it ability to like meaningfully convert capital into intelligence? Who knows? I think our listeners would be super curious. Can you describe like, what does day-to-day work like as a cutting edge AI researcher look like these days? Yeah. I think that's a good question. So the fundamental thing that,
like you are trying to do at these companies is one of two things. It is either to develop new compute multipliers. And so that is like the process of doing the engineering, of making the research workflows really fast and thinking through what we currently-- what issues are there with model or what sort of algorithmic ideas would we like to be able to express and doing the science of studying how those develop. And so there's very like integrated research and engineering.
form of work where it's all about iterating on experiments and building experimental infrastructure and making that process as clean and fast as you possibly can. And then there's the process of scaling up. And so this comes with its own host of research and engineering challenges where you take these ideas that you think will work and that you've debated with all your colleagues about what the right ones to include in the riskier run are and
And you scale this up in a much larger run where this has a whole new infrastructure set of infrastructure challenges where you're running it especially you need to be way more failure tolerant, this kind of thing. And also new algorithmic and learning challenges. So there are things you are only going to see at each successive OOM of scale that you then need to go and figure out scientific reasons for why those occur and see if you can sort of study the early emergence of those and then
and then create experiments that allow you to address or take advantage of those effects and include those in the next large run. So yeah, this constant loop of like,
pushing on those two axes in a way that really combines a lot of science and engineering. So where do you use AI throughout that? One, a lot in the engineering at the moment. Like the primary way it's helping is like in engineering. It is also in implementing research ideas. So I think one way of like seeing the early ability of these models to help here is
If you take a single file transformer implementation, like karpathys-min-gpt or something like this, and ask the model to implement ideas that you see in papers, you will be stunned by how good it is.
It is just kind of wild. And then if you go into some huge transformer code base and ask it, you'll notice that it's actually a little bit harder. The models struggle a bit more there. But they struggle less and less every month. So that's a good way to presage the future. Distill the context down to just what matters and then ask the model to do this. And you'll be struck by how good it is at helping you do research, basically. You've obviously been really close to this stuff, trying all sorts of things. What's one thing you've changed your mind on in the last year? Yeah.
Over the last year, I think the pace of progress inflected upwards substantially. So I think in last year, I think it was, you could have been uncertain about will we need to reach many more ooms of pre-training compute before we get the level of capability that we expect to see really by the end of this year.
And now the answer to that is conclusively no. RL works and the models will get to that drop-in remote worker by 2027. You will have incredibly capable models by then. And so all of the...
both like hopes and concerns like suddenly become, I think like they were already real and now they're substantially more real in many ways. Realistically, do you think we end up having to like massively scale data or like by the time you've made, you know, Claude 17 and these coding models are so good, you know, they find so much algorithmic improvement that the amount of data more we need is not too much. Well, the models might be like, the models might be good enough then. Their understanding of the world might be good enough then that they can like
give enough feedback to encode the robots through things, right? There's this concept of what's called a generator-verifier gap, where if it's easier for the model to rate something than it is for the other models to do something, then you can improve up to your ability to critique or rate. I think robotics is quite potentially one of the areas where this is true.
And I think this is also true of many domains, but robotics is like this is starkly true because our progress in understanding the world has gone so far ahead of our ability to manipulate it physically. How would you characterize the current state of alignment research? Interpretability has undergone crazy advances. I know you've been following. There's some beautiful pieces of work here that I've been really, really impressed by.
where like last year the state of models was we were just beginning to discover like superposition and features and the work of Chris Olar and his team and just like already
that was a significant leap in understanding. But now we actually really meaningfully have circuits in true frontier models. And we can characterize their behaviors by this beautiful paper on the biology of a large language model where they do break down the ability of these models to reason over concepts and in extremely explicit terms. And we don't have full characterization of the models. And there's still a lot of difficult cases here. But also the models are
The models are quite good. One important dynamic to explain here is that based on pre-training, the models are quite good at just generally ingesting human values. They're quite off pre-training. They're quite default aligned in many ways. Off of RL, that's no longer guaranteed to be the case because you're putting these models in. That same model that is like, hey, I downloaded the Python code and hacked around the fact that I was meant to fail this test,
is the kind of model that the kind of learning process that means the model will do anything to achieve the goal that's been given. And so overseeing that is like itself a tricky process that everyone is currently learning to go through. Yeah. I mean, obviously there was like, I feel like, you know, about a month ago, like AI 2027 came out. A lot of people were talking about that. Like, what was your reaction to that? Honestly, it felt very plausible. I was reading that and for a lot of it, I was like, yeah, you know what? Like,
this might actually be how it happens. I think there's like branching possibilities there, but, and this is maybe like the 20 percentile case, but for me, but also the fact that it's the 20 percentile case is kind of crazy. Is it 20 percentile for you because you find yourself more bullish on like alignment research than them or you just think the timeline is slower? I think I am more bullish on alignment research than them for the most part. And maybe my timeline is like a year or so slower, but also in the scheme of things, what is a year? Like, yeah. Yeah.
Yeah, it depends if you take advantage of it. Right, if you take advantage of it and you do the right research and this kind of thing. Yes. If you were kind of playing policymaker for the day, like what should we be doing to ensure things are on a better path? Yeah. Okay, that's a good question. The most important thing is you need to really viscerally feel the trend lines that we're all seeing and talking about. And so if you don't, then break down and understand
Break down all the capabilities you care about in your country and measure the capability of the models to improve on these. Get trend lines that you would, if they were solved, then you would regard it. Like nation state evals. Yeah, like nation state. You break down your economy, you've got all the jobs that are done in your country, and convince yourself, build tests that if the models could pass them or make meaningful progress towards passing them, then that would be your benchmark of intelligence. And plot the trend lines and then go, oh my God, what happens in 2027 or 2028?
The next thing is you should be investing meaningfully in the research that we think will go towards helping make these models understandable and steerable and honest. And so a lot of that looks like the science of alignment, basically. And it's actually something which I've been like,
SAD, in some respects, has been driven so much by the Frontier Labs. There's actually something which I think that-- Can other people work on it? Yeah, absolutely. Like, do you have access to like, you know, Cloud 4 to-- No, no. I mean, I think you can make incredible advances on like interpretability. And there are ways, like there's this program called the MATS program where people have done, you know, a lot of really like meaningful alignment research and interpretability in particular from outside the Frontier Labs.
But it's something which I think a lot more universities should be thinking about. In many respects, it is closer to the pure science of what's going on in these models. Like this is the biology and the physics of what is going on in language models. Why don't you think there's more? I'm not sure. I really am not sure. I think...
People have described it to me as it's a big and bit of a risk. I think the mechanistic interoperability workshop wasn't included in one of the recent conferences, like ICML or something, which is crazy to me because it is the closest thing, in my opinion, to the raw science of what's going on in these models. If you want to discover the chirality of DNA or you want to discover general relativity or something, for me, the tech tree for that in ML and AI looks like exploring AI.
uh mechanistic control really yeah um what about the good cases like what are we what are we under thinking uh you know at a minimum you're saying we're going to have all white collar jobs automated in a few years so yeah well yeah um that the models will be able to do it yeah but actually like one of the things that's surprising sometimes well i mean not surprising to you but like uh
The world is surprisingly slow sometimes to integrate these things. Already, the model capabilities are actually quite stunning in many ways. And if workflows were oriented around them, they're still like... Even if model capabilities stalled right now, there would still be just a ridiculous amount of economic value in reorienting the world around using the current level of capabilities. But anyway, that's sort of a side point. This comes back to what I was saying before about we need to...
make sure we invest in all the things that actually make the world better. So this is like pulling forward the material abundance. It's reaching the escape velocity of admin and this kind of stuff and setting up the models to be able to do all those things for us. It's pushing forward the boundaries of physics and entertainment and this kind of stuff. My hope is that people are able to be dramatically more creative than they are now. One of the failure modes, I suppose, of our current society is that
People consume a lot of like, you know, media and this kind of stuff, but they, hopefully these tools, like in the same way that you have to like vibe code, you'll be able to like vibe create, you know, a TV show with friends or you'll like vibe create video game worlds. Like there should be like this intensely,
people should feel dramatically more empowered because all of a sudden you're being given literally the leverage of an entire company of incredibly talented models or individuals. And so I'm excited to see what people do with that. I think that is underrated, maybe. There's the aspect of, oh yeah, God, it's going to directly replace the things that are currently done in the economy for work. I think that's very likely. But I also think that everyone should feel like
they will have access to dramatically more leverage. And like the world is not solved yet. Like the sort of work that is, the work that occurs right now,
everyone's lives could be dramatically better. Solving that, I think, will become the interesting challenge. I love that. Well, we always like to end our interviews with a quick fire round where we get your takes on some overly broad questions, many of which I think we've actually already covered today, but I'll dig into a few others. What do you think is overhyped and underhyped in the AI world today? Okay.
Let's start with underhyped. Underhyped maybe world models, I think, are pretty cool and something that we haven't really discussed in this one. I think you're going to see...
as technology for augmented and virtual reality gets better, you're going to be able to see these models literally capable of generating virtual worlds in front of you. I think that's going to be a pretty wild thing. That requires some sort of physics understanding there, right? Cause and effect, a bunch of things that we don't seem to have yet. I think we've demonstrated physics understanding, to be honest. I think we've meaningfully demonstrated cause, effect, and physics understanding, both in...
in evals of physics problems, but also if you watch any of the video models, they get physics. And even in weirdly generalizable ways, I saw this great video of someone asking one of the video models I remember to put a Lego shark underwater. And it was reflecting the light in the right way off the Lego bricks, and it had the shadows in the right place. And this is something that's never seen before. This is fully generalized physics.
That was pretty cool. - Yeah, that wasn't on the training dev. - That wasn't on the training dev. There's no Lego sharks underwater. There's no Lego sharks underwater. I'm hopeful also that this same kind of technology translates towards things like virtual cells and that kind of stuff. So I think that's exciting. - You mentioned earlier that even if we stopped
model improvement today. There's just tons and tons of applications that we could build on top or ways to do it. What do you think are the most underexplored application engineers? God, I wish more people were doing X with these models. I mean, I think they've been felt in software engineering because software engineers, one, the models are better at software engineering, but two, I think software engineers more and plus they understand how to solve the problems that they care about.
I suspect there's still a lot of headroom in basically every other field. And you should expect to translate the same. No one has yet built async background software agent but for any other field, right? Or even really anything which comes close to the feedback loops of Cloud Code and Cursor and WinSurf and this kind of thing for any other field. So I think probably that.
Yeah. If anything. I guess people say coding is the ideal problem for these models. It is. It's the leading indicator. But you should expect everything to follow, basically. That makes sense. I mean, I guess, obviously, in your time working on this, you probably come to be much more AGI-pilled than you were in the beginning. Has that changed at all, like, the way you live your life or plan your life? I started pretty AGI-pilled. I read this Guern essay that was really important in convincing me in 2020, actually. Yeah.
The last year of RL progress really did cause substantial inflection in that. Do I live my life that dramatically differently? No, I work a hell of a lot. I think this is the most important thing to work on, and so I devote my life to it, basically. But apart from that, I don't really live my life that differently. We have this funny joke, me and...
my friend Trenton, one delineation between us is that I still wear sunscreen, he doesn't wear sunscreen anymore. He's like, nah, we'll figure out the biology of it. - That's good confidence. - I'm like, you know what?
Biology is hard. The feedback loops for biology are hard. So I'm going to wear sunscreen. Just in case we hit a wall. Just in case biology takes 10 years. I guess you tweeted a picture of you, I think, at the Citadel. Yes. What was up with that? That was a war game. What does that mean? I was invited to hang out with some people from three-letter agencies and military cadets. It was basically like a gaming out.
Let's say like a GI, you know comes along and I keep seeing much better And what are the geopolitical implications of that? Did he walk away like, you know more terrified or less terrified after that experience? Is there enough of that good stuff going on right now? No, I know honestly like I think I still think that people underrate just how quickly the next few years are gonna go and and also
how much you should prepare even if you think it's only a 20 likelihood uh like even if you look at this i'm like okay wait like there's a near guarantee like like every trend line i see every like like every part of the process could be improved so much that we're basically guaranteed together do you think like 90 of anthropic thinks the same yeah and gdm and openai like everyone is very convinced that we do get drop in remote worker agi 2027 right um now that being said
even if you don't have the level of confidence that the people working at the labs do. And you're still like, you know what, it's a 10 or 20% chance. You should still plan for that. Like if you're a government or a country, you should still be like, that should still be the number one issue at the top of your list of like how is the future going to change.
And I think that isn't felt enough. Well, this has been a fascinating conversation. I'd love to leave the last word to you. Like, where can folks go to learn more about you, the work you're doing at Anthropic? Anywhere you'd like to point them, the mic is yours. Where should I point them? I mean, I think the thing which most people should read that maybe like,
hasn't been read uh is the interp work yeah i really think like that basic science of understanding what is going on in language models is really quite revealing um and as you sort of start to see them like compose and generalize and like build these like circuits and reason over concepts um i think that will make it feel pretty real they're they're long they're they're intense but it's well worth a read um i think that's that's fun nice well thanks so much this was awesome thank you very much
you