We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Ep 57: Former CTO of Meta Mike Schroepfer on the Path to Powering the AI Revolution

2025/3/5

Unsupervised Learning

AI Deep Dive AI Chapters Transcript

People

Jacob Efron

Mike Schroepfer

Topics

Mike Schroepfer: 我认为AI对能源的巨大需求，反而是推动能源领域创新和可持续发展的大好机会。当前美国需要大幅提升电网容量，而AI的需求加速了这一进程。这促使人们重新思考能源的来源和效率，并推动了太阳能、核聚变、以及海洋能等多种新型能源技术的研发和应用。未来，能源成本的下降将成为推动AI发展，以及改善全球生活水平的关键因素。我对于核聚变技术尤为看好，其能量密度极高，可以极大地满足未来的能源需求。此外，一些公司正在探索将能源生产与数据中心冷却相结合的新型方案，例如利用海洋的冷却能力，这将进一步降低能源成本。关于气候变化，我认为不能简单地认为AGI就能解决所有问题。我们需要采取短期和长期的措施，在满足短期能源需求的同时，积极投资和研发更清洁、更可持续的能源技术，例如太阳能、电池技术、地热能、核裂变和核聚变技术。大型科技公司应该积极参与到这些能源技术的研发和应用中，这不仅有利于自身发展，也对整个社会具有重要意义。在AI技术发展方面，我认为开源是推动AI技术进步的关键。Meta发布的Llama模型就是一个很好的例子，开源可以促进全球范围内的合作，加速AI技术的发展，并降低企业获取先进技术的成本。未来，AI开发工具需要更加完善，以更好地支持数据收集、模型训练和系统管理等方面的工作。同时，企业需要根据自身需求，谨慎地权衡自主研发和外包硬件的利弊。定制化芯片设计可以显著提高性能，但需要准确预测算法的未来发展趋势。在VR领域，生成式AI将极大地促进VR内容的创建，并提升用户体验。未来，AI个人助理将无处不在，为人们提供各种便利。 Jacob Efron: 我与Mike Schroepfer讨论了AI和能源的交叉点，包括如何大规模生产能源以实现AI的全球普及，以及AI在气候变化中的作用。

Deep Dive

Chapters

This chapter explores the critical link between AI and energy, highlighting the energy demands of AI and the need for increased energy production to democratize AI globally. It also discusses innovative energy solutions and their role in addressing climate change concerns.

AI's energy demands necessitate a significant increase in energy production.
Innovative solutions like fusion and offshore computing platforms are being developed to meet this demand.
The intersection of AI and climate change requires a balanced approach, addressing both short-term energy needs and long-term sustainability.

Shownotes Transcript

Translations:

中文

Mike Schreffer was the CTO at Facebook for nine years before founding the venture capital firm GigaScale, where he invests in companies using tech to fight climate change. I'm Jacob Efron, and today on Unsupervised Learning, we had a conversation predominantly around the intersection of AI and energy. Mike shared some really interesting thoughts on what's required to be able to produce energy at a scale to massively democratize access to AI around the world. We talked about the intersection of AI and climate change, and we also hit on Mike's reflections from his time at Facebook and where he thinks spaces like

AI developer tools, VR, and open source models are going. Mike also shared what he thinks the role of a CTO will be like given the changing world of AI and the improvement of coding models. Without further ado, here's Mike. Thanks so much for coming on the podcast. Really appreciate it. Good. Glad to be here. I feel like there's so many different angles that we can take on our conversation. And so a bunch of different threads we'll pull on. But I figured as someone who's done and led technical teams building a lot of really cutting edge products in AI and now super focused on the climate world,

the intersection is a really fascinating place. One thing that's come up on this podcast a few times is what feels like this tension right now between some of the environmental concerns and ESG commitments that companies have made, and then just the massive energy bill that's required for data centers today.

And so curious, like, how you think about that tension, the way companies are navigating this, and, you know, ultimately what's required for the U.S. to produce enough energy for, you know, some of these AI compute build-outs. Yeah. I mean, I think this is, contrary to what some people believe, this is the best possible news we could ever hope for in terms of mutual goals here, which is AI is an incredible technology that's going to allow us to do a lot of interesting things.

And the demand that AI is putting in the near term on sort of energy from customers who have technical savvy and the money and the desire to get this done quickly allow us to deploy a lot of new interesting solutions to power the grid. Even without AI, in the United States, we need about 5x our grid.

you know, by 2050 in order to really solve our goals. I mean, think about converting every gas-powered vehicle to an EV. Think about all the manufacturing we need to do, you know, here in the United States for steel, cement, concrete, and others. That thing, that all takes a tremendous amount of energy. So we need that energy no matter what.

This demand from hyperscalers in particular is sort of pulling forward the opportunity to deploy a lot of new technology and sort of get it down what I care about, get it down these cost curves where the reason why people are going to deploy it in later years is because it's the better, faster, cheaper option, not because it's better for the planet.

Yeah.

Have you seen anything particularly interesting in what folks are doing around data center build-outs? I think the first thing it's done is it's like,

It's caused people to think about this thing that is so much in the back office. It's like you think about power. It's like, where does my power come from? Nobody cares, right? And it's sort of like, how does my package get to my doorstep? Like, no, it's like, does it get there? Does it get there on time? And so we've sort of brought that conversation to the forefront. But I think what it's also done is, you know, it started to ask questions about like, what is the rate limiting step for humanity in terms of like, how do I make forward progress? How do I get 8 billion people to live in comfort and safety?

right? And the answer is technology. It's the only answer I know about. You know, most people who need it don't have air conditioning. Many people don't have clean water. Those are fundamentally power problems. We know how to cool air. We know how to heat air. We know how to take water and make it potable no matter where it came from. The limit on both of those things is effectively the cost of energy going into it. And so if you take energy and you attempt the cost of it and deploy it everywhere, you kind of now all of a sudden open up the possibility to bring a lot of people

the standard of living that we would all love and take for granted. And we think about AI and you say, okay, great, I would love to have an AI agent running full-time, a full reasoning agent 24-7 dedicated to me solving my needs. How much power does that, what's a reasonable amount of power to me people

burning? Is it a kilowatt, a megawatt, a terawatt? How much energy? And if I multiply that times a billion, 8 billion people, what does that look like? And you start to get to some crazy, fun numbers. But it's very tractable. So I think when you break that down and say, OK, well, what do we need to solve some of these problems? I think there's, to me, it keeps coming back to energy, energy, energy. How do we just make a whole hell of a lot more energy and make it in a sustainable and cheap way? And I think there's basically a few ways to go.

I think we have the thing that's working right now, which is solar. 80% of the new energy on the grid in the United States in 2024 is solar, utility-grade solar. Most people probably don't know this. And the reason why is it's cheap. It's just like the cheapest way to put new electrons on the grid. So that's good. Solar is awesome. It works 25% of the time.

You know, nighttime doesn't work, and the winter in the UK doesn't really work. So you have issues there, and it's sort of a bummer to have a $4 billion data center filled with, you know, Jensen's beautiful chips. You want those things running 40%. That's like not an acceptable answer. So you kind of want to run that thing 24-7, right? So you've got this sort of, you know, time balance mismatch.

So then you say, okay, well, solar is great. Deploy as much solar as you possibly can everywhere you can to solve it. And then you start looking at other techniques to solve this. And I think one of the ones that I'm most bullish on is fusion. It's a reaction that works. It's in the center of our sun. Humans have made it happen multiple times.

It's incredibly power dense. I have sort of rough calculations that I have to double check, but if you wanted to 5x the power grid in the United States and power all of it with fusion, one super tanker could fuel the entire United States for a year. That's the sort of order of magnitude where a pickup truck would fuel a major power plant for one year. It's just like

bonkers how much energy you get out of a teeny, teeny weeny bit of matter. And there's none of the other concerns with waste and other things that you manage. So you kind of give me a 40-acre plot of land, give me two years to build something, and then I've made a factory that makes energy with a teeny weeny little bit of input in terms of matter. So that is sort of magic.

There are other really interesting ideas. There's a company out there called Panthalassa, and what they have built is an offshore compute platform. And it's just a 200-meter tall thing that bobs in the ocean and it generates energy, harnesses the power of waves, basically. But in doing so, you basically combine energy generation with cooling because you're in the ocean, so you have immersive cooling in a nice big heat sink right there. And you put this all together and you say, hmm, that might be the cheapest inference planet ever.

It requires a little bit of creativity to think about how to deploy it, but they're out there building these things. And so there's always the opportunity for an interesting new idea to emerge that allows you to build in a way that you couldn't. And if I said, okay, well, if it's 50% or more cheaper, how much more compute capacity could you put online for inference and others?

I mean, given what you said about energy likely being the limiting factor in the future, and I love people have talked about this idea of like UBI being like compute that you give to the world or like a certain amount of energy. And so it certainly resonates. Do you expect kind of more, you know, obviously like Sam is invested in some of these energy companies. Do you expect even more, you know, this intersection between the financials

foundation model companies, hyperscalers, and energy providers? Yeah, I mean, you're seeing it already. Most of the hyperscalers have announced purchase agreements for power for either existing or new nuclear power plants, Meta put out an RFP for next generation fission plants. There were several offtakes underway in some of these things. And I think

Again, back to how do we stoke the markets to get these things done is if you take a long-term view on this and anyone who's thinking about AI is like, look, I got to have data. I got to have computation. I got to have the right algorithms. I have to have energy. And that is not a conversation we had five or 10 years ago, but it is a major conversation now is how do we get enough energy at the right prices to allow me to power the training? But more and more, as we're seeing with reasoning models, the compute

demand is moving from training to inference time. And I think that that gets even more interesting because you then move into a supply-demand curve for consumers, which is like, at runtime, I can decide how much compute I'm going to burn for this request. And the cheaper I can make those requests, the more demand I'm going to get, which is like, I think there's a near-infinite supply for inference time, reasoning compute at the right price

point. And so that's when you're not just like solving a problem for someone, you're basically enabling growth in the business by giving lots of energy. That's where companies like Panther LASA or Next Generation Fusion Technologies or others, I think can really just be the thing that makes the difference between one of these hyperscalers growing and not

you know, in 23rd and beyond. Yeah, because at each price point, there's basically some massive new set of questions that would get asked or use cases. 100%. This is the connection between these, you know, so I started in the 90s, you know, the internet came on when I was at Stanford, you know, worked in the dot-com boom, worked in the mobile transition, worked in the AI transition. And the sort of the tailwind behind all of this was like the creator and cost to compute.

You know, when I was at Stanford, they just stopped requiring you to take assembly programming to graduate with a CS major. You used to have to do it. I took it anyway because it was super fun. But it's like, yeah, like getting every ounce of compute power, you know, cycles out of the compute was no longer useful. Like we could use high-level languages like C, you know,

And of course now it's like who doesn't program in Python, Rust, or JavaScript or some other high-level language that's just like throwing away compute cycles on a daily basis for programmer productivity. And of course now we're moving to AI systems writing a bunch of our code and eventually running our systems. Those are even less power efficient per cycle for the thing they're doing in many cases. And so under this tailwind of sort of compute capacities just allowed us to do so many things. And I think...

I think what I'm looking for in the next 10 or 20 years is if you take that same concept and you say, what if we apply that to energy? And what if energy costs go down 10% a year, year over year for the next 20, 30 years? What happens? And I think the answer is a lot of really amazing things from AI compute to everyone to everyone living in air conditioned comfort to manufacturing stuff we didn't think we could manufacture

fuel. You're kind of articulating it's imperative to work on energy that's really around the fact that we have this technology and it's just not going to be like widely distributed and widely available to people unless we figure out things on the energy side. You know, I feel like another part of this is obviously climate change and its impact. And I'm curious, I feel like a lot of people in the AI community when, you know, faced with this idea of like, hey, you know, there's all this natural gas being put online to power these data centers in the short term, they're like, well, you know, yes, that's happening in the short term, but like AGI will solve climate change. And so like no big deal, like it's just, you know, three, four years.

How do you kind of think about that line of thinking and some of the short-term decisions being made? Yeah, I mean, I don't love the, and then AGI will solve the problem as an answer to anything. I think it's a really useful tool for a lot of things, but I think you're really kind of just punting the problem down the hill. And look, I think...

The answer is the next five years are going to be messy because if you want to put a gigawatt of power on the grid in a year or two and want to run that 70%, 80%, 90% of the time, a combined cycle and gas turbine is a really effective way to do this.

If you don't care about secondary effects of climate change and all the rest of it And so so I think a lot of people are making rational decisions saying we're getting power now. We will buy carbon offsets We'll do other things later And so you're gonna have some weird inversions where we're gonna put some new gas assets on the grid You know for AI which we're doing anyway, if you look at the sort of, you know energy generation in the u.s You basically see coal doing this gas doing this so coal is going down and being replaced by gas And then you have see this s-curve on the bottom which is solar

And that's going to catch up to us eventually and be real exciting. But I think what you need to do is say, look, we need to sort of do, we can walk and chew gum at the same time. We need to plan for some short-term things and get some stuff online. But let's start putting the markers down on what we want to do over the long run. So cool, 100 megawatts of gas this year. When I put a gigawatt on in 27, 28, what is that?

Is there some combination, like, can I build some solar, which I can build very fast? Can I start trying to look at battery backup as a way to make that solar last longer? Can I be investing in next generation geothermal and next generation fission and fusion? And basically put a marker on, can I pull that technology forward a little bit and or increase the probability that it's actually going to hit? Because if it does, it's a huge enabler for everything I do. Yeah.

Like all of these hyperscalers should be pumped if, you know, one of these fusion companies makes it or if next gen geothermal really breaks out. Because it's just like, all of a sudden, like, great, now I have a whole bunch of new suppliers I can get energy from. That sounds great. It's the gating factor right now or the limiting factor. 100%. And so that's what you see. I mean, thankfully, a lot of them are doing this, right? They're all making bets in these different areas.

I'm curious, as someone who's so close to this technology, it feels like there's lots of experiments being run and different things being tried. And we're kind of actually similar in AI in many ways. Like, you know, if we fast forward three years, we'll know a lot more about where things are going. What are like the two, three things that are kind of, you know, that we'll learn in the next two, three years around, you know, mass kind of energy, you know, clean energy production that you feel like are most top of mind for you?

You know, this is the funny thing is I think a lot of these curves are very obvious if you stare at them closely. I mean, I started working in AI in 2013 is when we started the Facebook AI Research Lab. And because that was when the first conventional neural net won the ImageNet Challenge. And it won it by like a landslide compared to any prior advancement. And you sort of looked at that and said, like, okay, well, what's going to make that be better? And you say, well, we can add more compute and we can add more data to it, you know, and use the basic same algorithm. And you're like, holy crap, like we have a lot of compute and data we can add to that.

As big as it was at the time, it was teeny compared to everything else. And so the tail ones are really, really in your favor. And the funny thing about when I got passionate about all the climate stuff is we have a bunch of these S-curves happening right in front of our faces. Our face is pushed up against the glass. I mentioned solar. We haven't even talked about batteries. We talk about electrifying sort of everything and doing it because it's the cheapest form. Batteries, the lithium-ion battery came on the market in 1991.

It's not that long ago. It's 97%, 98% cheaper than it was when it was introduced. It's still decreasing in cost by well over 10% a year, year over year. And I think when you think about, again, computation, to me, the advancement is like we poured all this human energy into chipmaking.

Like the ASML machine is just like I think one of the most brilliant pieces of technology ever. If you really want to geek out, go check this thing out. Right. And then we pour all this work to make a thing that like I can hold in the palm of my hand. Right. That is what makes AI work. You think about similar things with a lithium ion. Like my car is full of things that look like a AA battery.

And just a lot of them. And that's because we can make a lot of them really cheaply in these giant factories. Solar wafers are not that big. I can hold one and hand it to you. So anytime you find these technologies that have this capability to pour a ton of R&D, scale up manufacturing in them, and just make the same thing over and over again, that is the engine of progress for humanity.

I mean, you mentioned there that obviously when you were heading up, you know, as CTO over at Meta, you saw kind of the early ImageNet stuff and it was like apparent to you, hey, this is something that really could scale. I'm curious, obviously, you know, since that adventure and launching FAIR, Meta's done some incredible things, many of which you oversaw. Yeah.

there's kind of this journey maybe all the way through open sourcing the Lama models. As you think about that path, was the plan always to like, hey, we'll invest in this compute and scale up, but kind of do this open source? Yeah, if you look at the early days of FAIR and work that was done

produced a lot of things like PyTorch, the dominant framework for doing AI development these days. But a whole bunch of other models were released from FAIR. Even things like FICE, which is a nearest neighbor search algorithm that is down in the weeds of lots of things.

But the way I always think about technical systems is in layers. I think that in the old network days, you used to talk about the multiple layers of the stack. And you think about computers, I've got a chip, I've got an operating system, I've got applications that run on top of it. And the further you are down in that stack, the more commoditized you want to be. Because it's like, look, is every company in the world going to build their own foundation, like train and build their own foundation model?

Probably not, right? Just like most people don't make their own chips, most people don't make their operating systems like, hey, let's all use Linux. It's pretty good. We'll contribute to the parts that we want to change to it. And that's like an aggregation of, again, human ingenuity into one artifact rather than doing lots of copies of the same thing, which is inefficient. And so...

Our vision for AI from the very beginning was like, look, this is a foundational technology. It's going to be input into a lot of things. You're going to use it to make funny videos. You're going to use it to help with health diagnostics. You're going to use it to sort of run the power grid, to discover new materials, all sorts of things. Each one of those applications is going to require a tremendous amount of domain-specific knowledge, right? That a whole company, a whole industry could build themselves around. But underneath that, there's something that's generalizable.

And that's what's been called now the foundation model. And in the beginning, it was like the tools, like PyTorch. That was the first thing. It's like, okay, well, the way we train these models, we can at least share the work there. Let's start there. CUDA is another part of this and other things underneath it. So my thought was always like, let's get that technology. And then from a very selfish, from a standpoint of what's good for meta, my thesis was always, we know how to take this and put this into products in our domain. Perfect.

build great social products, great ad products, great consumer products. And so my goal as CTO is always, how do I make sure that our company has access to the very best technology it needs to build the products we want? And the important word there is access, not we made it, right? I don't really care whether we made it or someone else made it. I care that we can get it.

If it's open source and broadly used, by definition, we have access to it. And if we're helping to develop that open source thing, then not only do we have access to it, we're helping to drive it. It's like the best of all possible worlds. You get a lot of collaboration across the world, and we have sort of obvious access to it at a $0 cost, right? And that is the best possible outcome for the company.

I mean, every time you're releasing these models, people are making all sorts of inference optimizations and finding ways to run them even faster. Yeah. And it also just accelerates. I think it accelerates progress. I think that the definition of American innovation is sort of decentralized innovation, but collaboration at the right points. And I

think open source is one of those. I think scientific publishing is one of those. And like the more you can like do that, the more people I start by building on the work of you rather than replicating everything myself. And then we can advance to the end quicker. And to me, that was like, let's get to the end quicker. Let's make AI more powerful, more useful. And then we'll figure out how to put it in our products and turn that into returns for the

shareholders. You mentioned PyTorch, obviously, a certain piece of technology that you guys developed and use incredibly widely. As you think about the landscape for developers today and what folks are using, if you were still running that organization, what do you think are the gaps today in the developer tooling, having messed around with these models? Well, I mean, it's

Classic transition to developer tools where you start to sort of move up the stack, right? You know, we'll talk about assembly code to C to, you know, Python. You know, PyTorch is great, but sort of model architectures is not the place generally where people are innovating as much these days. Like we are doing, some people are, but for most people, you know, transformers or others are there. But you start to talk about like,

all of the systems around it. How do I collect my data set, train it for pre-training? How do I do post-training and RLHF and RL? So like, it's more of a system design problem than it used to be. And then how do I manage, okay, this is great. I've got this like 25,000 node cluster, which means that at any one point in time, some of those nodes are definitely down. Like, how do I manage that?

restart, do checkpoints, all of this sort of systems management. So we kind of knew this was going to happen, but it has moved from a sort of sitting at my desk doing work. I bring this analogy into physics. It's hard to do physics these days without a

you know, super cloud or something. It's not like every physicist has one under their desk, right? You have to have a big machine to do it. And like, we move from, I used to have GPUs under my desk and like do a bunch of work here to like, I need a cluster and I need a bunch of software and other stuff to manage it. So I think it's, you know, that's where it's sort of really moved is the entire system design from training to post-training to inference and how we manage all

all of that. And you kind of were referring earlier to obviously, you know, a key part of the stack here is the hardware itself. And I feel like, you know, over the last years, there's been some interesting movements from, you know, the hyperscalers and building their own hardware. And I know MetaZop's done some of this too. Like, how do you think about, or how did you think about the role that should play when you were there? And like, you know, do you, you know,

In the same way, not everyone's going to train their own models. How many people should we have training their own hardware? Yeah, it's a great question. I mean, it's a thing I've talked about with it because I do a lot of deep tech investing. And so we talk to people and I think understanding your supply chain and which parts of it you can outsource and which parts of you should own is a really critical part.

question for any company. I think it's the cool thing about technology companies is like, you have the choice, right? You don't have to buy everything, you can make it yourself if you need to. And exercising that choice at the right moments is really, really critical. You know, when I joined Meta and Facebook at the time 2008, we were leasing all of our data, other people build the buildings, right? And we bought other people built the servers, and we bought the servers and put them in the buildings and put the software on top. You know, as we started to scale up a like they weren't building fast enough.

They actually, the data centers we were leasing were terribly inefficient. And so we built our own. And so we built a ground-up data center, and then we started building the servers in that data center. Importantly, we didn't build every single piece of equipment in that data center in the Gen 1. We started with the web servers, which are the most frequent at the time, the largest end, the largest number of servers in there. And then we sort of worked our way through the rest of the stack to get there. These days, it'd probably be hard to find something in a metadata center in the

wasn't designed by Meta, you know, in terms of the boxes themselves. You know, at the time, Intel made great chips, so we didn't need to make CPUs. We could just buy them from Intel because it sort of generally suited our needs. So I think it's really important at all times to say, like, all right, of all the stuff I buy, like, am I getting what I need out of it? And I think, you know, NVIDIA makes just unbelievably great tech. Jensen is a hero, is a friend. He is, you know, a story of compound R&D investment. Like,

The thing he does well is he just reinvests it and builds a deeper and deeper mode. But I think a lot of these companies, when the cost of the GPUs ends up being a significant line item in the CapEx, you start asking yourself questions like, can we do it better? Is there something I can do to specialize the way my chips are used to do it cheaper, better, faster? And I think it's a question a lot of folks are asking. And I think there are different answers for different companies about whether that makes sense.

I feel like maybe in the past there would have been more time to breathe and make that decision, whereas it feels like the pace of how fast everything is changing in this wave is, I don't know, you've lived through a bunch of these waves, but it does feel like the pace is so at breakneck speed that you're constantly wondering, well, if I make the decision to build now and in six months something news out, will I regret that? And the bummer about chips is the best way to make

gains, it's really hard to beat NVIDIA with a general purpose chip because they have a bigger R&D budget than you do probably. They have a great team. So if your goal is I'm going to make a better GPU, a general purpose flops machine, I have low hopes that you're going to do anything there.

The only real joy is to basically specialize and say, all right, I know this specific algorithm. I'm going to implement this in hardware. Typically, if you take a software and you move it into hardware, it's about a 10x you should expect performance per watt or price advantage over that. But you've got to guess the algorithm right.

So do I literally embed a transformer architecture in there? And then in two years, when my chip actually comes out, is there some variant of that that's different or some other part of the process that's bigger? So it's a real delicate balance because if you can nail it, you can do a really good job. If you miss it, your chip can be worthless because it literally can't run the algorithm I want it to run. And so I think you're just seeing a lot of people trying to figure out where are the places for leverage here.

Even in the commitments folks had to make these data centers, right? Years ahead of knowing where, obviously, it seems like all science points scaling laws, but a lot can come out in the next year or two that might change. Sure. Two years in the future commitments folks are making. I think that's an easier bet because I think that you can repurpose that capacity for lots of other things. The idea that you'll need to compute for something is not that hard. And this is why NVIDIA continues to do really well because if you buy GPUs, let's say training isn't as important, they're not.

as efficient for inference, but they can be used for inference. So it's not like you have to throw it away. And so I think that's a relatively safe bet. And I think more people have been frustrated with wanting to do something and then realizing, I mean, this is the hardest part about my job. I remember early on when it was like we were building our first data centers. I was like, well, how much capacity do you need? And I was like,

I was like, "I don't know." You're asking me how many people are going to be on my product in 18 months. There is no way for me to know the answer to that. They're like, "Well, we have to order steel. You can't just order that. I can't go to Home Depot the day before to build you a 150,000 square foot facility. I kind of need to know now." It's just like, ugh, that impedance mismatch of, "I got no idea, but we're working on things that you have to decide two years in advance," which is anything in the physical world.

is a real gut buster for all of my friends at all the hyperscalers. What do you learn about doing that? I mean, obviously, you had to make those calls many times. Don't screw it up. Now, it's painful to underpredict and it's painful to overpredict. But I think I've regretted more underpredicting than overpredicting because it's...

It's just a bummer to see something happening and not have the capacity you need to go after and get it. And if you have extra and you need to deal with it, it's kind of like a financial problem. But if you don't have the capacity, it's like a technical and a product problem. And I think that that's a much more frustrating thing. You know, the other thing is I think there's lots of tricks. This is, you know, when we look at...

look at the hard tech companies, you know, if you think of this as a classic, you know, back to like chip design pipeline, you know, I've got a multi-stage pipeline where I've got to like go find a piece of real estate. I got to buy it. I got to build it. I got to do all these things. It's like there were tricks to deploy to sort of like

optimize that pipeline to give you some ability to either reduce latency or hide it or give you some more flexibility at different stages on it. And so I think we've learned a lot of tricks there that I probably can't share in detail about how to do that more efficiently. But I think it's, again, like anything, if you optimize that problem, you can do it really well.

You kind of oversaw so many interesting things. And one of them I just wanted to touch on because I thought it would be fun is, you know, you obviously had Oculus and kind of the whole VR world. And obviously that's always been a big vision of kind of the meta platform. You know, I imagine there's, you know, a lot of cool things you can do now with what these generative models can do. And I'm curious if you could just riff for a bit on like how you'd think about like the opportunity ahead of that now. Yeah, I mean, I think, you know, the vision I'd have is like the operator from the Matrix, you know, is your AI model.

which is like what you can do with, you know, VR in the future is I can be immersed in any virtual world I want. The real rate limiter for that is creation of content. It's like it takes, it's a lot of work to create really great content.

content. And so if you can imagine a generative AI system in a 3D world that allowed me to create anything I want sort of near instantaneously, that is just a sort of a mind-bending activity. So and it's sort of, you know, like the Matrix calling, "Hey, operator, I want this, I want that." And it's like, it just appears, and I'm in that world. Now, we're not, this is not tomorrow, this is not next year, but like, that is definitely within the grasp of us in the next, you know, next X years, I don't know what X is.

And so that is obviously super exciting. And then I think the other thing that's really exciting is the idea of a contextual AI walking with me everywhere I go, right? So if I have a pair of smart glasses that has the ability to sort of help me, whether that be anything from live translation, so I'm traveling somewhere and I want to speak to someone in their native language, I want to interpret a menu, wants to help give me context, it's effectively a tour guide at the same time of what I'm looking at,

or connecting me with friends and family to bring them into that experience or share memories or remember them from that. That is going to be pretty transformational. And I think we haven't had, there's so many interesting things happening in AI, but there's still a thing I do on the side. It's like this thing I go text or I go to my computer. And the idea that it's sort of a thing with me all the time is something we've really yet to experience. Although people have tried this in a bunch of ways. And I think that's something you're going to see in the coming years and I'll be

It feels like folks are still trying to figure out the form factor, but obviously, you know, uh, your former employer with the glasses and lots of people, you know, messing around with different kind of hardware you can go around with. I guess I asked it similarly on the climate side, but on the AI side, I mean, everyone's asking like, you know, where's, where's this world going in the next two, three years? What are kind of the key questions on your mind right now? Uh,

that you're kind of most paying attention to that we'll know in the next year or two? Yeah, I mean, the obvious thing that's been happening recently is sort of the emergent, you know, so for a long time, a year or two ago, the big question was, you know, the debate of like scaling versus algorithms, like how far can you go by scaling an LLM?

And I think the answer is we're hitting diminishing returns. And just a straight up pre-channeling, lots of data at a straight up language model. And so then the question is, what's next? And I think the surprising thing has been treating the LLM as sort of an input into a reasoning model

via post-training RLLs or things like that that allow you to use the LLM but sort of get better results on the outside. So I think that that's a really, I'm curious, like how much legs does that have? So is that a crank we can turn for two years, five years, 10 years?

How many domains are easy to verify? Exactly. Or are we just missing some other component? Obviously, a big one is memory. Million token context window from Gemini 2 is pretty amazing, but you'd really want some sort of associative long-term memory, and that's kind of what humans have. And we're kind of missing that still in LLMs.

And then I think as soon as you move into domains where they're hard to verify, I think you struggle a lot. So math is awesome because you can verify whether you got the answer right. Coding is awesome because you can at least like does it compile. And there's lots of data there. As soon as you move into video and other things like that, it's just harder to know where to ground the model. And so that's where I think we'll struggle a little bit. I guess switching gears to obviously your climate focus today,

What applications of AI to climate have you seen that you think are most interesting? I think there's a bunch. The way I like to think about it is a problem where it's a really critical problem that AI can solve. One is actually exploration. This can come in a variety of forms, but geothermal exploration. There's a company called Zanskar that basically said, look, there's places where if you drill in just the right spot, hot water pops out or steam pops out of the ground. It's like a magic source of energy.

And so the question is, well, can we find more of those places? And instead of just drilling in a bunch of places, can we use data to analyze it? So they're trying to use AI to basically find the best spots for geothermal. I think there's a lot of other examples of this. If I'm looking for copper deposits, if I'm looking for hydrogen, I'm looking for something. And rather than digging stuff up and trying to find it, can I use data to find it? That's a huge, really interesting area that I think we're going to see a lot more in

A little bit more pedestrian but useful is weather prediction and other things like that. Insurance, other things. Can I predict risk better for different things? And then I'd say probably more exciting is

We need either invention of new materials or optimization of new materials. And so whether this is for carbon capture, whether this is for, you know, catalysts for different kinds of reactions. And again, you're in this, like I'm searching a multidimensional space for a thing that has a certain set of properties. You know, can I find that more quickly or produce a bunch of examples that I can then go iterate and test? So materials discovery and other things like that. And even outside of climate, you know,

We just started a company on cancer cure discovery via AI. I think I'm optimistic on that in terms of finding pharmaceutical targets for other things.

It's like this idea of like, I've got this massive space of variables. Can I down select it to 20 or 30 or a thousand that may actually be, and then go test them all like that. That is a problem space that I think AI is already helping a lot with. I mean, this AI material sciences work is so interesting. I think what would be fascinating to see is like the milestones that end up mattering to like the end companies that would bring the stuff to market. Like I feel like

In the bio world, traditionally, the problem has been there's been all these amazing target identifications or things that look interesting, and then it turns out you still have to run five plus years of trials. And I feel like in the material sciences space, it feels like the equivalent there is people are like, oh, that's an interesting material. Now can you manufacture it at scale in a way that's interesting? 100%.

And that feels like it might end up limiting, you know, even if you can find really cool materials, how many as a company, how many can you actually bring to, you know, ideally, I guess, an outsourcing partner for that. But like, how many can you actually bring to mass manufacturing to show? In the bio world, there's been all these platform companies that ultimately only are able to bring one or two assets to market. Yeah. I'm kind of curious that you but you must see these companies all the time. Yeah. And actually, like this is I am bullish on AI in the long run. And I think it's way overhyped for these applications in the short run for this exact reason.

And the way I often describe it is like, give me a pie chart of all of the activities involved in bringing this new material to market in terms of time and cost. And like, how much of that pie chart does your AI solution actually address? And in many cases, it's like sub 10%. It's like, oh yeah, we have this discovery part here, but then I got to figure out, actually the hard part is scaling manufacturing.

And this does nothing for that. And so I have to do all of that work, and then I have to find customers and get out there. And so getting a couple of new discoveries doesn't actually help you. So I think that the solutions that are interesting are the ones that are sort of more direct end-to-end. That's why, for example, this geothermal discovery one is interesting. It's because exploration of new assets is an actual activity that people care about.

It's like, I make that much faster. And it's like, oh, okay, well, that's really interesting. That's really impactful to my bottom line. So finding those connections where you can sort of directly do it. In a more consumer-y lens, one of the companies we back is actually a tech founder. Selena Tabakawala started Evite, worked at SurveyMonkey. She has a product engineering background. And she's attacking sort of home efficiency and comfort for consumers.

It's like, "Hey, my house is kind of drafty and my power bill is too big. What do I do?" It's like, "Well, you can have a person show up in a truck and drive around and look at your house, or we can have AI do this." You hook a little thermal camera to your phone, you just take a bunch of videos around the house, you send it in, their AI agent processes and sends you back reports. It's like, "Okay, here's the top five things you can do. Add insulation here, replace your fridge, replace your water heater, payback time is seven months."

That's pretty impactful, right? And it's something that, again, most consumers won't wait to have the person show up and pay the 500 bucks for it. But for 50, 100 bucks, they would do something that saves them money with a payback period of six to 12 months. So that is a sort of direct application of, you wouldn't think of that as AI, but it really is AI because that's how we're figuring this out without having a human have to do all the work.

100%. I guess in the future, that will be integrated into our personal agents going around with us and just automatically ordering. Yeah, as I walk through my house, it's like, by the way, I got to insulate your front door. It's like, all right, I'll get on that. On the personal side, I'm curious, how do you use AI in your day-to-day life today? Well, it's an interesting question because I try it on a regular basis for some of the investing work we do. And I think that there's

I mean, this morning there was a great Hacker News post about an undergrad who basically found a new sorting algorithm. Sorry, a new hashing algorithm to improve the performance of super nerdy stuff, hash tables. And so this is a great, like, one of my favorite uses of LLMs is like, take this complicated paper and can you explain it to me correctly?

And it did a pretty good job of summarizing, but when I asked it to produce examples, they were kind of incorrect. And so I found that interesting. So I think distilling down large amounts of information, I found deep research actually surprisingly good. In my world, a lot of what we do is thesis investing. It's like, okay, geothermal. What are the right approaches to geothermal? Or what do we think about biochar? And so I've asked deep research to basically produce reports on all of these things. And I would say it's like,

It's equivalent to a pretty smart, non-domain-specific human. If I take a business school student who knows nothing about the field and says, spend a week, research, and then summarize all the different reactor designs for me and give me a guess, it's about how good it was in 15 minutes. It's pretty damn good. I generally use it as an accelerant for summarization and other things with a lens that I want to double-check all the data because it's still, unfortunately, not always accurate.

It's a question I always like to ask guests on the podcast. You have kids. I'm curious how you think about how AI will make their lives different and if that's impacted the way you parent or think about the skills you want to impart at all. Yeah, it's funny. I think that the more technology advances, the more I think the fundamentals are important. What I mean by fundamentals is just humans are still the best at looking at lots of data. The more obvious the answer is, the more a machine or an AI is going to figure it out.

And the more that like you're making a gut call on imperfect data, the more humans are sort of interesting in that. And so sort of literally critical thinking skills, basic math, writing, reading, like all of the skills involved and sort of consuming and understanding, thinking sort of at a higher level. You know, both of our kids code. I don't think that, you know, I don't know if they're going to be programmers full time, but like understanding how things work. And the thing, you know,

The two skills that I think have served me the best in my career, when I sort of look back on what were the formative things, is like I was on the debate team in high school. I had a lot of fun doing it. You were a Renaissance person. It was in the nationals for debate. But it just like from the very beginning, like getting up in front of a crowd and talking is like just not a big deal to me.

It doesn't mean I'm great at it. I'm just not nervous about it. But it also requires you to think about, I've got to convince you. I'm trying to convince you and your audience that we're in the beginnings of the next big revolution, which is energy. And then I think the second is what computer science and engineering at Stanford really taught me is like,

I think we learned in C, and then I learned C++. But I learned really quickly the language wasn't important. What it was is the foundations, like how does a computer work, and what are different ways to process data? And then how do I break problems down and understand them? And then how do I have enough of an ego that, look, if I think really hard about a problem, I can learn anything?

And so it's like here, I'm talking about fusion. I'm not a plasma physicist. I know a reasonable amount about it, but I can learn as much as I want to invest my time and energy in. And if I could wish for anything my kids, it's like that skill. It's just like, just apply your mind to it and you can learn it. Don't do any of the, I'm not an X person nonsense. When I hear people say that to me, I'm not a techie, I'm not a scientist, whatever.

Everyone, I think, has the capacity to learn something if you really apply your mind to it and just start peeling back the layers and understanding it one level deeper each time. And that is a skill that I wish more people had.

people have. Hopefully with some of these AI tools, deep research becomes a useful tool. It's like a really fast tutor. It's like, "Hey, I don't understand that. What the hell does that mean? Explain that to me." That's an amazing sidekick to have. It's just like, "Okay, I'm not going to judge you for not understanding that. Here's the answer." I always like when I'm explaining it in a simpler way and then it gets real simple. Yeah, explain it to me like I'm a sixth grader. It's like, "Okay, now that makes sense. Well, why didn't you start there?"

I mean, I guess you mentioned, obviously, this kind of, I think coding is one of the areas where these models have made the most progress. Like, you ran these massive tech organizations. As you think about the CTOs of the future in five years running these really large teams, what stays the same and what is going to be just so fundamentally different about those orgs and that role? Yeah. I mean, I think it'll be more similar than people think it is. And if you think about

Think about the quality of the tools, the size of the code bases in the 90s. When it was like, can we compile the whole code base? It was a challenge that was common in the 90s. Just like nobody talks about it anymore. And the teams have gotten massive. The code bases have gotten unbelievably large. And so we just got these...

you know, a joke. It's like when you're kind of digging with shovels and then like a backhoe showed up. It's like, oh, that's a lot faster. Let's like do that. And so I think it's just an extension of that. It's like, you know, AI to me is, is,

AI is to coding the way JavaScript is to C, is to assembly language. It's like, okay, cool. I'm expressing my thoughts at a higher and higher level of abstraction. And at AI, it's the ultimate. It's like, write me a piece of code that sorts the following array in the following ways. That's obviously faster than calling a sort function built in the standard library, which is obviously faster than writing the damn thing from scratch in assembly language. But I think at the end of the day, getting back to our other point, it's like,

The actual important thing is like, what problems are we trying to solve? What's important to go after? And how do I organize this group of smart humans to go after that problem? That is the sort of like universal skill of like, cool, I've got all this cool technology, but like, did we go really deep on this like sorting algorithm? Or are we solving this problem for our customers? Like what's the important problem to solve? And this is when I work with people, one of the things I look for is just like, there are people who can run this like priority queue in their head and they're consistently resorting. And they've like always got the most important problem at the top.

And there's other people that are brilliant, awesome, can do all sorts of things, but they just, like, get distracted. Like, they're off, like, crushing a 1% problem, right? And you're just like, ah, it doesn't matter. Come back to the 99%. Like, it doesn't matter if your thing's 10 times as good. And so, you know, just getting that skill in an individual and in an organization, it's like, are we focused on the highest level, which is the most important thing? And, like, the more you can do that, then you can use all these power tools to, like, build an amazing product and an amazing company.

Do you think the size of the organization required to do that changes? Like, obviously, people talk about, oh, they'll be the first 10-person, you know, billion-dollar company or something. I mean, I think you're seeing companies come up faster with smaller teams do it. So, yes, I think team sizes will go down.

We always like to end our interviews with a quickfire round, where I get your take on some overly broad questions that we cram into the end. Overly broad question and a quickfire comment. So maybe to start, do you think model progress this year will be more or less the same than last year? I think it'll be more. What's the go-to thing you try whenever a new model comes out? Summarize a very complicated research paper for me. What's your weirdest prediction on the implications for all this AI progress for the future and how it'll be different?

I think most people will have an AI friend.

Do those AI friends also intersect with, like, you know, I feel like people have tried this, you know, like, a group chat with some AI people and some real people, or, you know, how separate do those worlds end up being? I think it's probably, like, maybe you'll call that friend in a group, but it's like, if you've ever had the experience where you have two different friend groups who have two different hobbies, you can bring them together, and it's sort of awkward, you know? So I think that, like, I think the actual value is in a non-judgmental, on-your-team, like, what do you want in a friend? It's like, I want you to be on my side, like, I want you, you know, and have my interests at heart. And I think...

more people will find that with AI chatbots than we realize. I guess reflecting back on your time standing up FAIR and obviously running it super successfully at Meta, hindsight to 2020, we've learned a lot about AI, but anything you look back on and kind of regret or you're like, oh, I wish we'd done X or Y thing? I mean, there's a lot to be excited and proud. I mean, sort of PyTorch,

You know, Lama, you know, I think even in the last couple of years, people are coming around to, you know, when Meta really went all in on open weights for Lama, that was not a particularly common decision. And I think people are coming around to sort of the power of that. So I think that that's one of those things that we've had a long held belief that I think was right. We're on the right side of history there. You know, I do think that

you know, this debate of sort of scale up versus algorithms. I think there was a period of time where we were having that same debate internally, and I think we probably could have jumped on the scale up side a little faster. Well, I always like to leave the last word to you. So, you know, where can folks go to learn more about you and the work you're doing today or really anywhere you want to point folks? The mic is yours.

Okay. I didn't provide my list of sponsors. I like to think PowerAid. Oh, no. So in all seriousness, I mean, look, you can find me at gigscalecapital.com. You can find me on LinkedIn. I'm on X occasionally and Threads often. So find me in any of those places. Awesome. Well, thanks so much. This has been an awesome conversation. Awesome to see you.

Hey guys, this is Jacob. Just one more thing before you take off. If you enjoyed that conversation, please consider leaving a five-star rating on the show. Doing so helps the podcast reach more listeners and helps us bring on the best guests. This has been an episode of Unsupervised Learning, an AI podcast by Redpoint Ventures, where we probe the sharpest minds in AI about what's real today, what's going to be real in the future, and what it means for businesses in the world.

With the fast-moving pace of AI, we aim to help you deconstruct and understand the most important breakthroughs and see a clearer picture of reality. Thank you for listening, and see you next episode.

Ep 57: Former CTO of Meta Mike Schroepfer on the Path to Powering the AI Revolution 44:47 Share

Unsupervised Learning

Deep Dive

Shownotes Transcript

Ep 57: Former CTO of Meta Mike Schroepfer on the Path to Powering the AI Revolution