We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Alexandr Wang: Building Scale AI, Transforming Work with Agents & Competing With China

Alexandr Wang: Building Scale AI, Transforming Work with Agents & Competing With China

2025/6/18
logo of podcast Lightcone Podcast

Lightcone Podcast

AI Deep Dive AI Chapters Transcript
People
A
Alexander Wang
J
Jared
Topics
Alexander Wang: 我在 MIT 读书期间就开始对人工智能产生了浓厚的兴趣,并参与了 YC 孵化项目。最初,我们想为医生开发聊天机器人,但很快发现构建聊天机器人需要大量的数据和人工。于是,我们转变方向,专注于为其他公司提供数据和语言数据服务。我们推出了 ScaleAPI.com,这是一个为人类任务提供 API 的平台,并在 Product Hunt 上发布。这个想法在当时引起了创业界的关注。几个月后,我们意识到自动驾驶汽车是一个重要的应用领域,并决定专注于这个市场。尽管当时有人认为这个市场太小,但我们相信自动驾驶是未来,并最终成功地将 Scale AI 打造成一家大型企业。 Jared: 很多人都想成为下一个 Alexander Wang,因为大家都知道你从 MIT 退学并创立 Scale 的故事,但他们并不知道真正的故事。

Deep Dive

Chapters
This chapter traces Alexandr Wang's path from his time at MIT to founding Scale AI, highlighting his early exposure to AI, his decision to drop out of MIT, and the initial challenges and pivots that shaped the company's trajectory. It emphasizes the early focus on self-driving cars and the unexpected success of the 'API for human labor' concept.
  • Alexandr Wang dropped out of MIT to start Scale AI.
  • Scale AI initially focused on chatbots for doctors before pivoting to an API for human labor.
  • Self-driving cars became Scale AI's first major application, driven by a large contract with Cruise.
  • Early investors doubted the self-driving market's potential but Scale AI's rapid growth proved them wrong.

Shownotes Transcript

Translations:
中文

Since we recorded this Lightcone episode with Scale AI CEO Alexander Wang, Meta has agreed to invest over $14 billion in Scale, valuing the company at $29 billion. Alex has also announced he will lead Meta's new AI superintelligence lab.

Our conversation you're about to hear covers the history leading up to this investment, from scale's early days at YC to its integral role in the training of foundational models. Let's get to it. The AI industry really continues to suffer from a lack of very hard eviscerations.

evals and very hard tests that show really like the frontier of model capabilities. The biggest thing is you just have to really, really, really care. When you interview people or when you interact with people, you can tell people who are just sort of like,

phone it in versus people who sort of like hang on to their work. It's like so incredibly monumental and forceful and important to them that they they do great work. Very exciting time to see the how the frontier of human knowledge expands.

Welcome to another episode of the light cone today. We have a real treat It's Alexander Wang of scale AI Jared you worked with Alexander way back in the beginning actually what was that like what year was it put us in the spot and

Yeah, Alex, I mean, most of what we want to talk about today is like what scale is doing now because like the current stuff is like so, so awesome and so interesting. Since scale got started at YC, I thought it just seemed appropriate to start all the way at the start. And it's funny, Diane and I were at MIT once.

last month talking to college students. And of all the founders, the one that they most look up to and want to emulate is actually you. Everybody wants to be the next Alexander Wang because everybody knows the story of how you dropped out of MIT and ended up starting scale, but they don't know the real story. And so I thought it'd be cool to go back to the beginning and just talk about the real story of how you ended up dropping out of MIT and starting scale. So before I went to MIT, I worked at Quora for a year.

And so this is 2015 to 2016 or no, sorry, 2014 and 2015 was when I worked as a software engineer. And this was already at a point in the market where ML engineers, as they were called, or like machine learning engineers made more than software engineers. So that was already like the market state at that point. I went to these summer camps that were, that were organized by, you

by rationalists, the rationality community in San Francisco. So, um, and there were for precocious teens, but they were organized by, um, uh,

many people who have become pivotal in the AI industry. So one of the organizers is this guy, Paul Cristiano, who used to, who's the inventor of RLHF actually. And now he runs, or he's a research director at the US AI Safety Institute. He was at OpenAI for a long time. Greg Brockman came and gave a speech at one point. Eliezer Yudkowsky came and gave a speech at one point. And actually it was very, like when I was, I don't know, it must've been 16 or 17,

I was exposed to this concept that like potentially the most important thing to work on in my lifetime was AI and AI safety. So something I was exposed to very early on. So then when I went to MIT, I was started at MIT when I was 18. I like studied AI quite deeply. That was most of what I did in the sort of day job. And then, um,

kind of got antsy, applied to YC. And then the idea was kind of like, okay, how could... Initially it was like, okay, where can you apply sort of like AI to things? And this was in the...

era of chatbots, which is like crazy to think about actually, um, that there was like this like mini chatbot bubble. Yeah. Yeah. A hundred percent in, uh, in 2016, um, which is, uh, which was, I guess, spurred by magic, right. Or, or some of these apps and, and, uh, Facebook had a big vision around chatbots and anyway, there's this little mini chatbot boom. So the initial thing that we wanted to work on, uh, and, uh,

was chatbots for doctors, right? Which is like a funny idea because do you guys know anything about doctors? Yeah, no, not at all. Like basically, no. It was just sort of like, oh, doctors are a thing that sound expensive. And so, and I think it was like, I think it's like indicative of like, I mean, I don't know, you guys see this all the time, but I feel like most of the times young founders, like first 10 ideas are like,

First of all, they're very memetic. So they're probably like, there's a lot of like the same ideas over here. There's like a dating app. There's like some, something for like, you know, social life, you know, the same ideas. And then I think that like, I think young people have a very poor sense of alpha. Like what are the things that they're actually like,

going to be uniquely positioned to do. And I think, you know, most young people don't have a sense of self. So it's, you know, it's not clear. So when we were in YC, we were roommates with, with, with another YC company. And we were sort of like, we were sort of observing this, like, this like chatbot boom and

ahead of, you know, those happening at the time. And, but it was very clear that like chatbots, if you wanted to build them, and this is funny to say in retrospect, required lots of data and required lots of like human elbow grease to be able to get them to work effectively. And so like, just like kind of off the cuff at one point, it was like, oh, like,

What if you just did that? What if you just did the data and the language data and the human data, so to speak, for the chatbot companies? We were also very lost, by the way. I think you probably remember. We were quite lost mid-batch. And like many YC companies, I think. And so then we

switched to this like concept. I think the, you know, the initial idea was like API for human tasks or something along those lines. And one night I was just like trolling around for domains. ScaleAPI.com was available and then we just bought it

We launched it, I think a week later. On Product Hunt. Yeah. I remember it. The Product Hunt page is still live. I was reading it last night and I remember the tagline, it was an API for human labor. Like that's my recollection of sort of like the like distilled insight that you have was like, what if there is an, what if you could call a human with like an API? Yeah. And that was, I mean, I think it was like three days for us to put up the landing page, launch on Product Hunt. I think this idea captured,

some amount of imagination of the startup community at the time because it was sort of like this weird form of futurism where you have like

humans delegated, like APIs delegate to humans in this interesting way. Yeah, it's like an inversion of the, yeah. Yeah, yeah, exactly. Humans doing work for the machines instead of the other way around, yeah. Yeah, yeah, yeah. It's funny because the initial phase, we just worked with all these engineers who reached out to us from that product hunt, which was a real grab bag of use cases.

But then that was enough for us to raise money at the time and to get going. And then a few months after that, it became clear that self-driving cars was actually the first major application that we needed to focus on. And so there were many very big decisions, I would say, in the first years of the company. PRIYANKA VERGADIA: One thing that was curious

at that point there were other solutions that were already the game in town like Mechanical Turk from Amazon was sort of the thing that people were using but you ended up capturing this whole other set of people that didn't know about it and you had a way better API and you kind of won

It was not clear at that point because you probably were compared a lot with Mechanical Turk. Yeah, so Mechanical Turk was definitely the sort of like the concept in most people's mind at the time. I mean, it was just it was kind of one of these things where I think a lot of people had heard about it, but anyone who had used it knew it was just awful. And so it's like whenever you're in a space and that's kind of the like

that's like the thing is like people mention a thing, but it sucks. That's usually like a pretty good sign. Um, and so that was, that was enough to give us like early confidence. But then I think the thing that like really, I would say that the, um, the thing that was as actually fundamental to the success of the company was actually focusing on this, like on this, like seemingly very narrow problem of, of self-driving cars. I think that, um,

You know, I remember very early on when it was maybe like six months after we were out of YC. Basically, there was another YC company, Cruise, that had reached out to us on our website. And sort of like in the blink of an eye, they became our largest customer. And they found you just from your launch? Yeah, I think maybe even Google. It's not even totally obvious, but just, yeah, vaguely from our launch and vaguely. It was actually an XYC founder that was working at Cruise that reached out to us. So maybe some...

YC mumbo jumbo. Um, we're a caretta. Uh,

Who knows? The world works in mysterious ways. And so they grew very, very large. So then early on, we made this decision. And I remember we went to our lead investor at the time and we had this conversation. It's like, hey, actually, we think we should probably just focus on this self-driving thing. It was actually a very interesting conversation because the reaction was like, oh, that's just obviously way too small a market. And you're never going to build

like a gigantic business that way. And we're like, we think it's probably a much bigger market than, than you think it is because there's like, you know, all these self-driving companies are getting crazy amounts of funding and the automotive companies are doing huge programs and self-driving. And it clearly is the future. Like it feels like,

something that, that, um, that should exist. And so we're like, if we focus on it, we think we can build, like build the business much more quickly. And it's funny looking back because both things are true is both true that it enabled us to build the business to be, to get to scale pretty very quickly. And it was also true that that was not a big enough market to sustain a gigantic business. The story of scale in many ways is like this progression of like, how do you continue? You know, AI is this incredibly dynamic space. Um, lots of things are constantly changing and, um,

A lot of, I think, what we pride ourselves on at the company is how we've been able to continue building on and contributing to this very fast-moving industry. When did you become much more aware of the scaling laws? Because one of the interesting facts that sort of emerged is that you're a little bit the Jensen Huang of data. I think that in self-driving...

scaling laws were not really a thing because and the fundamental the biggest reason actually was that like one of the biggest problems in self-driving is that your whole algorithm needs to run on the car and so you're very constrained by the amount of compute you have access to and is available to you so

like a lot of the engineers and a lot of the companies working on self-driving never really thought about scaling laws. They were just all thinking about like, okay, how do you keep grinding these algorithms to be better and better, better that are like small enough to fit onto these, onto these cars. But then we started working with open AI in 2019. This was like GPT two era. And I would say like,

GPT-1... GPT was sort of like this curiosity. GPT-2, I remember OpenAI, they would have a booth at these large AI conferences and their demo would be to allow researchers to talk to GPT-2. And it was like...

Like, it wasn't, like, perfectly impressive, but it was, like, kind of cool. It was, like, kind of this thing. And then I think by GPT-3, it was sort of this, like... That's when the scaling loss clearly, you know, felt very real. And that was, I mean...

I think GPT-3 was 2020. So it was actually like long before... Before the world caught on to what was happening. Yeah. Did you know as early as 2020? Did you have a strong inkling that this was really going to be like the next big chapter of scale? Or not until ChatGPT took off? Was that clear? Like 3.5 or 4? Was it 4? I think that like...

in 2020, I think it was clear that scaling laws were going to be a big thing, but it was still not totally obvious. I remember this like interaction, you know, I, I got early access to GPT-3 and then it was like in the playground. And then I, I was like playing with it with a friend of mine. And I told the friend of mine, Oh, you can like talk to this model. And during the conversation, uh,

my friend got like visibly frustrated and angry at the AI, but in a way that wasn't just like, oh, this is a dumb like toy. It was like, it was in a way that was like somewhat personal. And that's when I was, I realized like, whoa, this is like somehow qualitatively different from anything that existed before. Did it feel like it was passing the Turing test at that point? Kind of, it was like semblances. Yeah, semblances. It was like sort of like the glimpses of it potentially passing the Turing test, right? But I think the thing that really, um,

caused the recognition of i would say generative ai which is still even the term in some ways it was really dolly i think that that um that convinced um that convinced everyone but i think i think my my personal um journey was like gb3 like was like highly interesting and then and so it was like one of many bets at the company and then in 2022 over the course of dolly and

And then later ChatGPT and GPT-4, et cetera. And we worked with OpenAI on InstructGPT, which is kind of the precursor to ChatGPT. It became very obvious that that was like the bet-the-farm moment for the company and for, frankly, the world. That's when we saw it as well with the big shifting companies because it was that 3.5 moment release end of 2022. And we started seeing a bunch of companies and smart people coming

changing directions and pivoting their companies in 2023. And that was that moment. This dynamic that you referenced, which is kind of the, you know, skills, the NVIDIA for data kind of thing. I think that became quite obvious, you know,

I would say GPT-4 really was the moment where it was like, wow, this is like, like scaling laws are very real. The need for data will basically, you know, grow to consume, you know, all available information and knowledge that humans have. And so it was like, wow, this is like this like astronomically large opportunity.

Yeah, four seemed like the first time it was something that you could get to not hallucinate basically ever. You could actually have a zero hallucination experience in limited domains, which is we're still sort of in that regime even at this point. You know, the classic view is that if it's hallucinating, you're not giving it the correct data in the prompt or context or

you're trying to do too much in one step. Yeah. I mean, I think, I think like the reasoning paradigm is, is, has a lot of legs and it's actually been interesting this last era of the, of model improvement because, um,

The gains are not really coming from pre-training, which is so we're like moving on to a new scaling curve of reasoning and reinforcement learning. But it's like shockingly effective. And I think that, you know, it's...

The analogies between AI and Moore's law are pretty clear, which is you'll get on different technical curves, but if you zoom way out, it'll just feel like this smooth improvement of models. One of the things that has been popping up with some of the really big, well-known wrappers is they're getting access to full parameter fine tunes of the base models, especially the frontier base closed source models.

Is that like a big part of your business or something that people are sort of coming to you for? Just like these verticalized full parameter fine tune like data sets. Yeah, I think this is going to be a like blueprint for the future. Right. So right now, I mean, like the total number of.

large scale parameter fine tune or reinforcement fine tune models is like still pretty small. But if you kind of think about it, like that, like one version of the future is that every firm's core IP is actually their specialized model or their own fine tune model. And just in the same way that like, you know, today,

you would generally think that the code, the, the IP of most tech companies is their code base. In the future, you would generally think that their, their, their specialized IP might be the model that powers all of their, all their internal workflows. And what are the special things they can add on top? Well, they can add on data and environments that are somehow specific, very, very specific to the day-to-day problems or information or challenges or business problems that they see on,

On a day to day level. And that's the kind of like really gritty, real world information that, you know, nobody else will have because nobody else is like doing the same, the exact same business motion as them. There's a lot of weird tension in that, though. I remember friends of ours from one of the top model companies came by and they were like, hey, do you think YC and YC companies would give us their evals so we could train against it? And we were like, no, dude, what are you talking about?

Why would they do that? Because that's like their moat. And then I guess now that based on this conversation, it's actually, I mean, evals are pretty important as a part of RL cycles. And then even the evals are not really the valuable part. The valuable part is actually the like properly fine-tuned model for your data set and your set of

you know, sort of problems. Yeah. It's like these Lego blocks, right? If you have the data and you have the environments and then you have the, you have, you know, base model, you like, you know, can stack those on top of each other, get, get a fine tuned model. And obviously the evals are important. This is some of the tension. And this is basically at, you know, in a nutshell, the sort of like, um,

does AGI become a Borg that just sort of like swallows the whole economy in like, you know, as one firm, or do you still have a specialized economy? My belief, generally speaking, is that you still do have,

especially as economy, like, like, these models are platforms, but they're like, like alpha in the modern world will be determined by, you know, to what degree you're able to sort of like, encapsulate your business problems into data sets or environments that are then conducive towards

building like differentiated models or differentiated AI capabilities. Yeah, that's why asking for evals was so crazy to me because like, okay, you get the evals, the base model is way better. And then not, you know, now all your competitors have exactly the same thing that used to be your advantage. I think we will undergo a process in AI where we learn

what the bright lines are, right? I mean, I think that like, it's like very obvious and intuitive to tech companies that they should not give away their code base and they should not give away their database. Like they should not give away their data, they should not give away their code base. The analogs of that in a, you know, highly AI fueled economy, I think we'll identify over time, but are, yeah, the evals, your data, your environments, et cetera. I think you have a very techno-optimistic view of what the future is going to be with how jobs are going to be shaped.

Can you talk more about that? Because I think you hinted at it before. It's going to be more specialized. It's not that all these jobs are going to

go away, right? First off, it's undeniably true that we're at the beginning of an era of like a new way of working. Like, you know, there's this term that people have used a long time, which is like the future of work. Well, we're like entering the future of work, or certainly the next era. And so work fundamentally will change. But I do think...

humans own the future. And we are, we are like, we have a lot of agency actually, and a lot of, a lot of choice in how this sort of like reformatting of, of work or how the reformatting of sort of like workflows ends up playing out. You know, I think you kind of see this play out in, in coding right now. And I think coding in some ways is, is really the sort of like,

case study for other fields and other, you know, other areas of work where sort of the initial phase is this sort of like assistant style thing where, you know, you're kind of doing your work and then the models are kind of like assisting you a little bit here and there. And then you go to a, you know, the sort of like cursor agent mode kind of thing where you're like...

synchronously asking the models to like carry out these workflows and you're sort of like you're managing like one agent kind of or you're sort of like you're kind of like pair programming with a single agent and then now with like Codex or other systems like it's very clear the paradigm is like oh you have this like

you have this like swarm of agents that you're going to deploy on like all these various tasks. And you're just going to like, sort of like, you know, like give all these tasks and you'll have this sort of like this, this cohort of agents that are sort of like, you know, doing this work that you, you think is appropriate. And that last job has a, has a semantic meaning in the, in the current workforce. It's a manager, you know, you're basically managing this sort of like the

this set of agents to do actual work. And so, and, and I think that like AGI or, you know, AGI or doomers or whatnot, like they take this view that like, Oh, even this job of like managing the agents will just be done by the agents. So like humans will be taken out of the, of the process entirely. But our belief, my personal belief is that, you know, this is, um,

is very complicated. Management is also like more about like what's the vision that you have and what's the sort of like what's the like end result you're aiming towards. And those will be fundamentally, I think, like, you know, we have a human dimensionality

demand and human desire driven economy. So those will be driven by humans. And so I think the terminal state of the economy is just is large scale humans manage agents in a nutshell. I have a funny story where founder friend of mine is trying to promote a

one of his junior employees, but they're really, really smart. And they're working on the agent infrastructure. And then he was like, hey, do you want to like, you know, I'm looking for someone who could step into management. You've never managed people before. Do you, you know, if we hired some people under you, like, how would you feel about that? And this, you know, mid 20 something really smart, you know, sort of, he's just like, he's an engineer. And he's like,

Why would I do that? Just give me more compute. The model, look at what just happened to the model literally last month. I didn't have to do anything. It just started doing things that it couldn't do a month ago. Why would I want to manage people? I will just manage more agents for you and it's fine. Okay. So what are the unique things that humans will do over time?

this like element of vision is very important. This element of like, kind of like debugging or sort of like,

fixing when things go wrong. Like most of a manager's job, speaking as a manager, is just like putting out fires, dealing with problems, dealing with like issues that come up. Like I think intuitively, you know, the idealistic manager job seems like this very cushy job because you're like, oh yeah, all the other people do all the work and I'm just sort of like, I just vaguely supervise. And then the reality is obviously like highly chaotic. I think people often jump to this like, you know, extreme reality where it's like, oh,

oh yeah, these like, you know, you're just going to manage the agents and you're going to sort of like live this, like, you know, kind of Victorian life where all your problems are solved. But, but no, I think it's still going to be pretty complicated, like getting agents to like coordinate well with one another and like coordinating the workflows and, and then debugging the issues that come up. Like these are still complicated issues. And, and,

You know, having seen what happened in self-driving, which was more or less that, like, you know, it's easy to get to 90%, very, very hard to get to 99%. I think that, like, something similar will happen with large-scale agent deployments and that, like, you know, final 10% of accuracy will be, like, you know, will require a lot of work. Yeah, even for self-driving cars right now, there's the remote assist for all these super edge case. So there's still a human at the end managing the car. Yeah, and the ratio, by the way, I mean...

The companies don't publish them, but I think the ratio is something like five cars to one teleoperator or maybe even less than maybe three cars per teleoperator. So the ratio is like...

you know, much lower than people think. I think that like humans are much more involved even in self-driving cars than I think most people appreciate. I mean, which if you put it in that perspective, I think it's still very optimistic. It's just the output of getting rides instead of doing in today's world, if you're an Uber driver, you just do one car. In this world, you can do five cars, right? Well, you have to believe for this, like for an optimistic version of the future where, you know, unemployment is still low, et cetera. You just have to believe that humans are like almost,

almost insatiable in their desire and their demand. And that like, you know, prices will go down, things will become, you know, the economy will become more efficient and we'll just like want more. And I think this has been a pretty...

trend for like the history of humanity is that like, you know, we have somewhat insatiable demand. And so I have, I have like conviction that like, you know, the economy can kind of get as efficient as it needs or as it like can get like hyper, hyper efficient. And then human demand will just like continue to sort of like fill the bucket.

In the 20th century, when you said computer, maybe early 20th century, people didn't think of a computer as it is today. They thought of a human being that would sit in front of a punch card tabulator. And that was what a computer was doing. It was a job title. Literally, that was a real person's job. And then, of course, now today, it's like, where are all the computers? Well, they're actually real computers now. I don't know. That was the Apollo mission. It was a bunch of...

people just crunching numbers with the trajectories of the Apollo. And that was it. Because the computer that went on the rocket is actually was a microcontroller with I think only like single digit hertz. It was like very tiny amount of computations. It was just humans doing it. Totally. And even this like, I mean, I think the concept of being a programmer is somewhat, is like highly esoteric in the sense that like, oh, you're like writing the instructions for these like

machines to just like you know

just continue to do repetitively. And in some ways, it's like the leverage boost that all humans will get is like similar to the leverage boost that like programmers have had historically. For a long time, I think like a lot of people in Silicon Valley say this, like the closest thing to alchemy in our world pre-AI, let's say, is programming because you sort of like, you can do something that creates like an infinite, there's these infinite replicas of whatever you build and they can sort of like run

run an infinite number of times. And I think the entire human workforce will soon see that large of a leverage boost, which is extremely exciting because I think that programmers have benefited over the past few decades from this unique perch where they have 1, 10,

10x or 100x engineer can like can build something like absolutely incredible and like very very valuable and like very um uh shockingly productive and all of a sudden i think like like humans in all trades i think will like gain this like level of leverage uh so i'm curious to return to a point that you made earlier about like how scale has kept reinventing itself if you had to like describe the arc of scale like what's what's what's the story and what's

What were the turning points? Our initial business was all around, you know, producing data, you know, generating data for various AI applications. And primarily self-driving car companies, right? For the early years, it was really, like you were saying, you were really focused on that. Yeah, for the first, like, three years, fully focused on that. One of the properties of focusing on that business, of building that business is over time, you know, we had this, like, obligation to really, like,

get ahead of most of the waves of AI, if that makes sense. Because for AI to be successful in any vertical area, it needed data. And so our demand for our

our products would precede a lot of times the actual sort of like evolution of AI into those industries. So, you know, as an example, we started working with open AI on language models in 2019. We started working with the DOD on government AI applications and defense AI applications in 2020. It was like long before I think the, you know, recent sort of like drone fueled, you know, AI, you know,

AI craze in the Department of Defense. We started working with enterprises long before there was sort of like this, you know, the recent sort of like larger waves around enterprise AI implementation. So almost sort of systemically or

Or intrinsically, we've had to basically build ahead of the waves of AI. I think this is actually quite similar to NVIDIA. You know, whenever like Jensen gives his annual presentations about, you know, NVIDIA and its future and its outlook, like he always is so ahead of the trends. And that's because he has to get there on the trend before the trend can even happen. That's, I think, been one thing.

One way in which our businesses continue to adapt because AI is like this, you know, it's this, this, like it's the fastest moving industry I think ever in the history of the world. And so, you know, that each, each turn, each evolution has been, has moved incredibly quickly. The other thing that, that happened May 2021, early 2022, we started working on applications. And so we started building out applications.

AI-based applications and now much more so agentic workflows and agentic applications for enterprises and government customers. And this was an interesting evolution of our business because historically, like our core business is highly operational. You know, we build this like data foundry. We have all these processes to produce data.

data. It's a very operational process that involves like lots of humans and human experts to be able to produce data with quality control systems in place. That highly operational business and the success of that business is what created the momentum for us to, you know, sort of dream about building an applications business. When we went into it, I had studied other businesses that had basically successfully added on

very different businesses and what are sort of like the unique traits or why do some of those work? And one of them that is probably the most interesting, I think is like the most singular in modern, modern business history is the,

Amazon building AWS. You know, if in 2000 you had written a short story that said that like, you know, this large online retailer would build this like large scale cloud computing rent to server business. Like it would seem like nonsensical. I remember when they launched AWS in 2006, they,

Amazon stock went down because all the analysts thought it was such a terrible idea. It had never been done before. It just, like, it doesn't seem related at all to their core business. It has... It's, like, this, like, weird thing. But the sort of, like, wisdom of that was, I think, twofold. I think, like, first... And from talking to people who were, like, there at the out... You know, the sort of, like, the genesis moment of this business, like, one thing, probably the most important thing was that they had conviction that that... That...

the sort of like underlying business model of AWS would basically be this like, this like infinitely large and growing market. Like that market would, would literally grow forever. There would be like this like exponential of the amount of compute that needed to built up, needed to be built up in the world. And if you did that, there was like sufficient cost of, you know, cost advantages from economies of scale. I think like startups, you know, you kind of like, you kind of have to like switch modes at a certain point where like early on,

you're trying to go for very, very narrow markets, almost the narrowest markets you can. And then you're just trying to gain momentum and then slowly grow out from those hyper narrow markets. And then at some point, if you have ambitions to be a $100 billion company or more, then you have to switch gears and say, where are the infinite markets? And how do you build towards those infinite markets? And so this was the moment where we realized that. And

The simple realization was that every business and every organization was just going to have to reformat their entire businesses with AI-driven technology. And now, obviously, like agent-driven technology. And that would just be, like over time, that would swallow the entire economy. And so it was like another one of these, like, okay, that's an infinite business to build out.

AI applications and AI deployments for large enterprises and governments. I think a lot of people don't realize that you guys are in the middle of this transformation. They still think of scale as the data labeling company. But like, if you fast forward 10 years,

do you think most of scale will actually be the agent business? Yeah, it's growing much faster at this point. I think it's an infinite market. So the crappy thing about most markets is that they have like a pretty shallow S-curve. But then, you know, you look at...

hyperscalers or like, you know, these like mega cap tech companies and they just have like these like ridiculously large markets. So you really want to get into these, these, these like infinite markets. So our strategy so far has been to focus on building use cases for, you know, focus on a small number of customers and like,

and be quite selective. So we work with, you know, the number one pharma company in the world, the number one telco in the world, the number one bank, the number one healthcare provider. And we work a lot with the U.S. government, you know, the Department of Defense and other government agencies. And the whole thing is like, how do we take a very focused approach towards building stuff that resemble, you know,

real differentiated AI capabilities. And all of this, I think, sounds somewhat trite, but we have this multi-hundred million dollar business in building all these applications. By my count, I think it's one of the largest AI application businesses in the industry. That's certainly what our investors tell us. And it's fueled by

our differentiation in the data business because our belief fundamentally is that kind of what we talked about before. The end state for every enterprise or every organization is some form of specialization imbued to them by their own data.

Our day jobs historically have been producing highly differentiated data for, you know, these like large scale model builders in the world. And then we can apply that wisdom and that capability and those operational capabilities towards enterprises and their unique problem sets and give them specialized applications. Honestly, like it kind of sounds like Palantir. Yeah.

at the like most zoomed out level. Yeah. You sort of like squint. In that you're a technology provider. We're like a technology provider to like the most, you know, some of the largest organizations in the world with a focus on data. Yeah. And I think the key difference is like, you know, Palantir,

has built a real focus around these data ontologies and really solving this messy data integration problem for enterprises. And then our whole viewpoint is what is the most strategic data that will enable differentiation for your AI strategy? And how do we...

generate or harness that data from within your enterprise towards developing that. I guess you will end up being pretty big competitors in another five, 10 years. But for now, like it's basically so green field. I mean, I think it's an infinitely large market. Yeah. So you might not ever meet actually, which is interesting. Yeah. I think in practice now we actually like, frankly, we work

We're more partnered with Palantir than competitive with them. Well, that's because the problems at these giant organizations are actually so massive and intractable that they throw up their hands. It's like they have no shot at ever hiring people who could possibly solve the problem. But a company like Scale or a company like Palantir can actually hire kind of the same kind of people who would apply to YC, actually. It's kind of like this. Yeah, I don't know. The through line in my head right now is realizing like,

You know, there's plenty of capital. And then the limiting agent is actually really great technical, smart people who are optimistic and actually work really hard. It's like not enough of those people. That's true for the world. And by the way, I think one of the cool things about...

as we were talking about before, is that all of a sudden those people get near infinite leverage. So I think that bottleneck gets exploded now, hopefully, due to AI. Again, I think just like how in cloud, AWS is the largest by far, but there's so many other cloud providers that actually are all like... It's not a winner-take-all kind of business per se, and it doesn't have to be. Yeah, exactly. And I think that...

It's just too big of a market to even be close to winner takes all. There's no single organization that can have the...

operational breadth to be able to swallow the whole market. Talking about operations, you clearly are living in the future, which is super cool. I'm sure you're running scale with all these agents and tools already to make it very efficient. Could you share some of the things that you're doing internally as a company and agents you're adopting so you can do more with less people? We saw this early because when

the model developers were starting to develop agents and starting to develop using reinforcement learning, like actual, you know, like reasoning models where the models could actually like really do end-to-end workflows. We were responsible for producing a lot of the data sets that enabled the agents to get there. And then we saw just like how effective that training process is. I think that like the efficacy of reinforcement learning for agent deployments is like,

Is pretty insane. So then once we realized that. We realized like okay. If you can actually like you know. Turn existing.

human driven workflows into environments and and data for reinforcement learning um then you have this ability to convert these like human workflows and human workflows especially ones where you're like okay with some level of fault faultiness and and okay with a certain level of reliability you can convert those into um into agentic workflows so there's all sorts of like you know

agent workflows that happen in our hiring processes and happen in our quality control processes and happen to sort of just like automate away certain like data analyses and data processes as well as like various like sales reporting. Like it's sort of like embedded at every major org of the company. And the whole thing is like, it's just like mindset. Like, can you identify these very

very repetitive human workflows and basically like undergo this process where you convert that into data sets that enable you to build automation tools. What do these data sets actually look like? I mean, for browser uses, like, is it an environment? And then, you know, here's a video of a human being going through this process of like filling out this form and decide like, yes, no, on this drop down or something. I mean, you know, what's a concrete example just for the audience? One of the processes that we go through is like, you know, you, you,

you'll take a sort of like full packet from a candidate and you'll like want to distill that into like, you know, a brief of some sort that sort of like gives all the salient details about that candidate for like decision by a sort of like broader committee. And these kinds of cases, you know, broadly speaking, like deep research plus plus kind of things are like the lowest hanging fruit. It's just sort of like, can you take these processes that like...

more or less look like, you know, you have to like click around a bunch of places and pull a bunch of pieces of information and then blend them together and then produce some analysis on top of that. Like that process, that fundamental like information driven sort of like analysis process is the easiest thing to drive via agentic workloads. And the kinds of data you need are just like, you know, we call them kind of environments, but usually it's just like, what is the task? What is the full sort of like

data set that's necessary to conduct that task and what is like the rubric for how you conduct that effectively. Do you need RL and fine tuning when like prompt engineering and meta prompting seems so good? I think that yeah I mean I think I think prompting I mean as the malls get better prompting will get better but like prompting gets you to a certain level and then reinforcement learning gets you beyond that level and actually this is a good point. I think that like probably most of the time in our business it's

mostly prompting, that just works really well. I mean, that's the weird thing is like, oh, shoot, you don't have to crack open the models. And then frankly, like...

the next models are going to be so good. And then the evals are mainly about picking which model or, you know, at what point do you switch to the next one? I do think startups need basically like a strategy for how they like will walk up the complexity curve, so to speak. Like you need to like, you know, whatever product or business you build, like needs to like really benefit from like the ability to like race up this complexity curve, which is the broad, broad

broader curve of capability of the models. I mean, you actually created this leaderboard that has a lot of these super hard tasks that are trying to go into this next curve of reasoning. Can you tell us about it? One of the things that we built in partnership with the Center for AI Safety is Humanities Last Exam. It was a funny name. I think, unfortunately, there will be yet another exam beyond it. But, you know, the idea was how, like, let's effectively work with, you know,

the smartest scientists in the field. And, you know, we worked with many very brilliant professors, but also very many like individual researchers who are like quite brilliant. And we just collated and aggregated this data set of what the smartest researchers in the world would say the

the hardest scientific problems that they've worked on recently are. They solve them or they sort of like came to the right, you know, they were able to solve the problems, but they're sort of like the hardest problems that they're aware of and know of. I was curious how you came up with these problems. So each of the professors contributed new problems. So these are not, these are problems that have never appeared in any textbook or any exam ever. They just like came out of their brains and they like typed up like a new problem, like from scratch. Am I understanding this right? Yeah. And the general guidance was like, you know,

What has come up recently in your research that you think is a particularly hard problem, right? The problems are stupidly hard, incidentally. They're like insane. I don't know if you guys have looked at these problems. They're totally crazy. Yeah, it's totally crazy. And by the way... They cannot be searched on the internet. It's like you need to have a lot of expertise and actually think about them for quite a long time. Yeah, they require a lot of reasoning. I'm recently like... Right now, so we have a time limit where the models...

can only think for, I think it's 15 minutes to 30 minutes. And one of the most recent requests from one of the labs was like, can you please increase that time limit to like a day so that the model has like up to a day to think about the, to think about the problems. But yeah, no, they're, they're deviously hard problems. Unless you have expertise in the specific problem, you probably don't have a chance of getting it right. But even this evaluation, like I think when we first launched it,

You know, and this was earlier this year, the best models were scoring like 7%, 8% on it. Now the best models score north of 20%. It's moved really, really quickly. And I think, you know, I think... So you think we're going to get a benchmark saturation for this one as well? I think eventually, yeah, it'll be saturated. And then we have to move on to new evaluations. I mean, I think the like...

The saving grace for the naming was that it is the last exam. The new evals will be sort of like real world tasks, real world activities, which are sort of like fundamentally fuzzier and more complicated. Have you solved any of the problems yourself, Alex? I know you were a competitive math person for a long time. Yeah, yeah. I mean, the math problems require a lot of, they're like very deep in the fields. I think I managed to get a handful, but like most of them are like hopeless. Yeah.

Yeah, I looked at the ones that the models can solve. And so that was one of the evals. And we've produced a number of other evaluations. But yeah, I think that the AI industry really, I think, continues to suffer from a lack of very hard evals and very hard tests that show really like

the frontier of model capabilities. And these evals, when you build an eval that sort of like becomes popular in the industry, it has this like deeper effect, which is that that's all of a sudden the like North Star and the yardstick that researchers are trying to

optimized for. And so it's actually this like very gratifying activity, you know, we built humanities last exam, you know, most of the like all the model providers, you know, will always report their, their, their, their results. There's like tons of researchers who are really motivated by by doing a good job. I mean, it's, it's, and the models are going to get, you know, deviously good at like, you know, frontier research problems.

I guess Sam's starting to talk about, you know, that sort of stage four innovators of AGI is coming and, you know, that's the prognostication for the next year. Do you think that's, you know, correct? The next 12 to 24 months is like really the moment that literally new scientific breakthrough, breakthrough, um,

is coming from the operation of reasoning and these models. I mean, I think it's super plausible, you know, in fields like biology, and this is probably one of the ones that comes up the most, but there's like, there's probably intuitions that the models have about biology that humans don't even have. Um, because it's just, you know, they have this like different form of intelligence. Right. And so you'd expect there to be some areas in some fields where the models, um,

have some fundamental deep advantage versus humans. And so I think it's like very realistic to expect in those fields. Biology, I think, is sort of like the clearest one for me. It kind of already happened for chemistry. Last year, the Nobel Prize went to the Google team, Demis and John Jumper with AlphaFold. Yeah, exactly. That was like a huge jump. Before that, there was like this competition where they were trying to get more protein-fold structures that were going to get soft and

It was like abysmal and AlphaFall destroyed it. It's a strange time to be a scientist, but an exciting time for science. There's this short story. It talks about this future where like, you know, there's...

effectively AIs that are conducting all the frontier of R&D research. And scientists, what scientists do is they just sort of like look at the discoveries that the AIs make and sort of like try to understand them. Yeah, I mean, I think that like very exciting time to see how the frontier of human knowledge expands. And then, I mean, I think that'll be great because in areas like

And biology will fuel breakthroughs in medicine and healthcare and all these other things. And then the majority of the economy will chug along, you know, giving humans what they want. China open sourcing or deep seek open sourcing their models is like another very interesting question. Like, how does that play out? And there's this awkward...

sort of thing that the best open source models in the world now come out of China. I mean, that's sort of this like awkward reality to contend with. - What do you think we can do to just make sure that it's the American models that are ahead or is that written in the stars or something tells me that's not?

The simplest explanation for me about why the Chinese models are so good is espionage. I think that there's a lot of secrets in how these frontier models are trained. And when I say secrets, it sounds more interesting than they are, but there's just a lot of tacit knowledge. There's a lot of tricks and small intuitions about where to set the hyperparameters and ways to make these models work and to get the model training to work. The Chinese labs...

have been able to move so quickly and accelerate and make such fast progress, whereas some even very talented U.S. labs have made progress less quickly. And I just purely think it's because a lot of the secrets about how to train these models, those secrets leave the frontier labs and make their way back to these Chinese labs. I think the only way to model the future is that China has...

pretty advanced models. You know, the Solace right now is they're not the best models. They're sort of like a half step behind, let's say. But it's tough to model what will happen when it's sort of truly neck and neck. We're very behind on energy production, which is just pure regulation. Like that could be fixed in two seconds, but, you know, hasn't been yet. That's a huge problem. I mean, if you look at, you know, not that the

past will be a predictor of the future if you look at what US total grid production looks like it's like looks flat as a pig and if you look at you know Chinese saw that aggregate you know grid production it's like you know it's doubled over the past decade it's just like it's just this like straight up I saw that and it's astonishing it's I mean that's just a policy failure China just you know the vast majority of that is coal and coals growing in China and

In the United States, actually, renewables have grown a lot, but renewables trade off against the sort of fossil fuels. So we've sort of like done a transition of our energy grid, whereas there are just –

continuing to compound, let's say. We have this issue on power production, but we're advantaging chips. I think, like, net-net, we will come out ahead on compute. If you look at data, I mean, this goes towards a lot of the questions you've been asking about, but, like, I mean, I think China is, like, fundamentally very well positioned on data. It's weird to say because it's obviously, like...

you know, we help all the American companies with data in China, they can ignore copyright or other privacy rules and, and they can sort of, you know, build these large models without abandon. And then, and then the second issue is that there's actually large scale government programs in China for data labeling. There are, you know, seven data labeling centers, like three,

in various cities that have been started up by the government itself, there's large scale subsidies for, for AI companies to use data labeling, a voucher system. In fact, there's like college programs because, you know, one of the interesting things is in China, like employment is such a large national priority that they like,

you know, when they have a strategic area like AI, they'll like figure out, okay, what are all the jobs? And they'll like create these like funnels to, to, to create those jobs. And then we're seeing this in robotics data too, where like there's already in China, they're like logistic,

large scale factories full of robots that just go and collect data. And strangely enough, like even a lot of US companies today actually rely on data from China in training these like robotics foundation models. Long story short, I think China likely has a data, an advantage on data. And then the algorithms, you know, the US is,

on net much more innovative but if espionage continues to be a reality then like you know you're basically even on algorithms so um so it's hard to model but i think that probably like you know it's like

60, 40, 70, 30 that the United States has an undeniable continued advantage, but there's a lot of worlds where China just catches up or potentially even overtakes. I mean, the scary thing for me is watching Optimus or YC has some robotics companies like Weave Robotics and

We look at those things. The software can be as good or better than anything coming out of China. But when it comes to the hardware, it's like bomb cost over here, $20,000, $30,000. We can't even make high-precision screws over here. And then over there, the same robot, the embodied robot, could be made for, I don't know, $2,000, $3,000, $4,000. It's like you just walk down a street in Shenzhen and they got it. And so how do you compete against that at sort of...

that at a state level. The degree to which China is incredible at manufacturing, I

I mean, that's a very big problem. And it relates to defense and national security. It's a fundamental issue because on some level, defense and national security will boil down to which countries have more things that can deter conflict or can shoot other things down. Yeah, I don't think it's going to be fighter jets and aircraft carriers anymore. I mean, it's probably going to be this micro war. It's like hyper micro. It's

drones and embodied robots. And I mean, yeah, exactly. Drones, embodied robots, cyber warfare, the, um,

cold war era philosophy of like, you know, you build like bigger and bigger bombs. It's like the exact opposite of that. It's actually like, it's like the fragmentation and, and, and move towards sort of like, you know, smaller, more nimble, attributable resources is the, is the, that's like one of the big picture trends, I would say. And then the other big picture trend is just what we believe, which is,

the move towards agentic warfare or agentic defense, which is basically, you know, if you, if you actually mapped out the, what warfare looks like today or like what, like the, you know, the actual process of a conflict, you know, if you look at Russia, Ukraine or other conflict, other conflict areas, like the decision-making processes are driven, are remarkably, you know,

manual and human driven. And it's just like all these, all these like very critical battle time decisions are made like with very limited information, unfortunately in these like very manual workflows. And so it's very clear that, that if you used AI agents, you would have perfect information and you would have immediate decision-making. And so the, you know, it's, we're going to see this like huge shift towards AI

agent-driven warfare and agent-driven conflict. And it has the potential of turning these conflicts into these like almost incomprehensibly fast-moving kinds of scenarios. And

And that's something that you guys are actively working on, right? Is there anything that you can talk about? I assume some of it is classified, but... Yeah. So one of the things we're doing is we're building this system called Thunder Forge with the Indo-Pacific Command out in Hawaii. It's responsible for the sort of the Indo-Pacific region. And it is the flagship...

DoD program for using AI for military planning and operations. So we're basically doing exactly what I said. We are, we take

the existing human workflow, the military works in what's called a doctrinal way, or they're sort of like governed by the doctrine of this like, you know, very established military planning process. And you just convert that into, you know, a series of agents that work together and conduct the exact same task, but it's just like all agent driven. And then all of a sudden you turn these like

very critical decision-making cycles from 72 hours to 10 minutes. And it kind of changes it from when you play chess, if you play chess versus a human, they just spend all this time thinking. It's sort of this slow game. And if you play chess against a computer, it's just these immediate moves back. And it's this sort of unrelenting form of warfare. Some of it is like being able to see the chain of thought immediately was...

Is the most powerful thing. Yeah. Like, cause it's, you know, I don't want the answer. I want to see how you got there. And then actually seeing the reasoning itself was so powerful. I mean, that's actually why the launch of that first deep seek was way more interesting. Cause I think, Oh, one had come out, but they hid the, the reasoning. And it's like, no, the reasoning is actually a really important part of it.

And the only reason why they hid it was they didn't want other people to steal it, which they did anyway. I think that that's another like...

interesting thing about this space, which is that, you know, so far you could really model as like, there's like advanced capabilities and you can try to keep those secret and you can try to keep those closed, but they open over time, kind of no matter what you do. Well, I mean, clearly, Alex, you've done a lot of incredible things and transformed your company multiple times and you have all these deep matter expertise in many areas. You're clearly hardcore.

Is there advice for the audience to be more like you? You know, I think that the biggest thing is you just have to really, really, really care. And I think it's like a folly of youth in some ways that...

That when you're young, like almost everything feels like, you know, so astronomically important that you just like you try immensely hard and you care about every detail. You know, everything matters just way more to you. And I think and I think that that trait is really, really important. And, you know, it's like just in varying degrees for different people. So I wrote this post many years ago called Hire People Who Give a Shit. Yeah.

it really is pretty simple. You notice, I noticed, you know, when you interview people or when you interact with people, you can tell people who are just sort of like,

phone it in versus people who sort of like, they like, hang on to their work as like, you know, it's like, it's like, so incredibly monumental and forceful and important to them that they, they do great work. And it's sort of like eats at them when they don't do great work. And when they do great work, they're sort of so satisfied with themselves. And so there's sort of this, like, the magnitude of, of care. And one of the greatest indicators of like,

A, just like how much I enjoyed working with people or like, frankly, how successful they were at scale was really just this like, what is what, you know, to what degree of their, what degree of their soul is invested in, into, into the work that they do.

And so I think that that, you know, if you were to pick one thing that that probably is the sort of like unifier in some way, it's like, you know, I care a lot. I care a lot about every decision we make at the company. You know, I still review every hire at the company. You know, I, I, we have this process why, where I approve or reject literally every single hire at the company. And, and so I care immensely. And then this, and then like, I work with all these people who care immensely. And then that enables us to really sort of like,

we feel much more deeply what happens in the business. And as a result, we sort of like, you know, we'll change course more quickly. We'll learn more quickly. We will take our work more seriously. We'll adapt more quickly. And I think that that's been quite important to the success that we've had. Alex, you were telling me a story recently that stuck with me about how like quite recently, even when Scale was a very large company, you were personally hand reviewing all like the data that was being sent to partner companies and being like,

basically like the final quality control, like, you know, like, you know, that data point is not good enough. Yeah, exactly. I think a lot of founders would probably, would probably, you know, agree with this, but what your customers feel and when your customers are happy and sad, like it really like gets to you. And so when you have, when you have unhappy customers, it's like, it's like personally very painful thing. Broadly speaking,

you know we have this value at our company um quality is fractal um and and i do believe that like

high standards sort of like they trickle down within an organization. And, you know, it's very rare that you see an organization where like, where like standards increase as you get lower and lower down in the organization. You know, most of the time when people realize their manager or their management manager or their like director or whomever don't really care, then they sort of like, you know, that, that,

removes the sort of like the like deep desire to need to care. And so it's like incredibly important that that high standards and this sort of like this deep sort of care for quality is like this is this like deeply embedded sort of tenant of the entire organization. Founder mode, man. Founder mode. Man, we got to have you back. Thank you so much for spending time with us. With that.

Sorry we're out of time, but we'll see you next time.