We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

AI's Energy Crisis: Innovations Shaping the Future of Computing

2025/1/16

Lexicon by Interesting Engineering

AI Deep Dive AI Chapters Transcript

People

Brandon Lucia

Topics

Brandon Lucia: 我是高效能计算公司的首席执行官兼联合创始人，也是卡内基梅隆大学的教授。过去十年，我致力于能效计算和资源受限计算的研究。我们公司研发的技术源于在卡内基梅隆大学进行的十余年研究，其核心是彻底重新设计芯片架构，以实现最高的能效。传统上，人们通过增加算力来提高计算效率，而我们则从架构层面入手，在每个设计决策中都优先考虑能效。这与当前科技行业面临的挑战——如何在日益增长的计算需求下降低能耗——高度契合。我们发现，许多AI应用中，除了机器学习部分，还有大量的数据预处理、规范化、数据格式转换等任务。专用AI处理器擅长处理机器学习部分，但对于其他任务则效率较低。而我们的芯片架构能够高效处理整个AI应用流程，包括数据采集、清洗、规范化、数字信号处理以及机器学习等，从而实现更高的端到端效率。我们的芯片架构采用数据流模型，将程序分解成数据流图，并将其映射到芯片上的网格状结构中。这种并行处理方式能够显著提高性能，同时避免了传统冯·诺依曼架构中频繁的指令读取和解码操作，从而实现极高的能效。与市场上其他低功耗嵌入式CPU相比，我们的芯片在能耗方面最多可提升166倍，同时性能也具有竞争力。我们的芯片已在基础设施传感、空间与国防应用以及健康与可穿戴设备领域得到应用。在空间应用中，尺寸、重量和功耗（SWAP）至关重要。我们的芯片能够显著降低卫星的功耗，从而减少电池和太阳能电池板的尺寸和重量，降低任务复杂性和风险。例如，在月球探测任务中，我们的芯片可以用于视觉导航，在能源受限的环境下实现高效的图像处理和机器学习。未来，计算技术将从过度专业化转向注重能效的通用型计算。随着AI应用的普及，能源消耗将成为一个日益重要的因素。我们的芯片架构兼顾通用性和能效，能够满足各种计算任务的需求，并为未来的计算技术发展提供支持。

Deep Dive

Chapters

Brandon Lucia, CEO of Efficient Computer and professor at Carnegie Mellon University, shares his journey into energy-efficient computing. Driven by the limitations of existing chip architectures in battery-less devices, he and his team embarked on a clean-slate redesign prioritizing energy efficiency.

Focus on energy-efficient computing inspired by limitations of existing chip architectures in battery-less devices.
Clean-slate redesign of chip architecture prioritizing energy efficiency.
Amdahl's Law motivates the pursuit of end-to-end efficiency in AI applications.

Shownotes Transcript

Translations:

中文

Welcome to today's episode of Lexicon. I'm Christopher McFadden, contributing writer for Interesting Engineering. Today, we're joined by Brandon Lucia, CEO of Efficient Computer of Professor Carnegie Mellon University, to explore how energy-efficient chips are reshaping the tech landscape. From AI to space exploration, Brandon shares insights into solving real-world challenges with groundbreaking processor design and unparalleled efficiency.

Join us as we dive into the future of computing and discover how energy efficiency could unlock new frontiers in technology and sustainability. Before getting into today's episode, here's something to elevate your 2025. Level up your knowledge with IE+. Subscribe today to access exclusive premium articles enriched with expert insights

and enjoy members-only technical newsletters designed to keep you ahead in technology and science. Subscribe now. Brandon, thanks for joining us. How are you today? Yeah, thanks. It's great to be here. I'm really interested to be talking to you about what we're doing at a phishing computer. Great. You're welcome to join us. Can you, for our audience's benefit, can you tell us a little bit about yourself, please? Yeah.

Yeah, yeah. So I'm a CEO and co-founder at Efficient. And for the last 10 years, been a professor at Carnegie Mellon University. I've had about a 20-year career in computer hardware, computer software, software.

and energy efficient and resource constrained computing. I can tell you more about what that means. Been looking at basically the hardest problems in designing computer systems where you don't have enough energy, you have to fit into a small form factor, things like that.

And, you know, from that research, from that last decade of research we've been doing at Carnegie Mellon, where we were looking at some of those really hard problems, especially in energy constrained computing, we developed the tech that is sort of underlying foundational technology behind what we're doing at Efficient.

Great, great stuff. Right then, building on that then, so what inspired you to focus on energy efficient computing and how does this align with the tech industry's broader challenges today? Yeah, so there's actually, there's kind of an interesting anecdote that goes with that. You know, a long time ago now, it's the better part of that decade that I just mentioned. We were at

We're at CMU. It was me and Nathan Beckman, another of our co-founders here at Efficient, and Graham Gobieski, who was then our PhD student. He's now our CTO at Efficient. And we were looking at some research problems at that time. And one of the research problems that we'd started looking at was how do you make computer systems that don't have a battery efficient?

And so we started looking at this problem. And so it's a bit of a niche, but it's a very interesting niche where you try to find, you know, how do you energy optimize the power system? How do you energy optimize the computing part of the system and software and so forth? We started building these systems and, you know, we were doing, it was a long time ago now, we were doing machine learning on devices that didn't have a battery. Before there was even, you know, this thing called TinyML, we were trying to shoehorn like Lynette into those devices.

And what we found is, as we wanted to do more and more computing on the device, which, by the way, is a good idea because it cuts out the need to use the radio, which is a big energy hog.

then you end up pitting a kind of energy bottleneck related to computing on the device. And we got frustrated by this. We wanted to do more and we ended up only really being able to fit on a battery-less device 500 milliseconds of computing at a time. And then we were storing energy in little capacitors and our capacitors would run out of energy. And so what was frustrating was that

The reason that was so short is the architecture of the chips that we were using on these systems was really inefficient. And so we had this realization that, you know, we'd been doing computer architecture for lots of, with lots of other goals in mind for, I mean, literally almost two decades at that point, Nathan and I had been at least.

And we realized like, this is an architecture problem. What needs to change is what's inside the chip. And so at that point, we gave ourselves this, I mean, it was, again, this is going back eight years, like this sort of playful ideological goal, which was let's do a clean slate redesign of the architecture. Let's start from scratch. And at every juncture where there is a design decision to be made, we make the design decision, which is the most energy efficient.

And that is not what like the chip industry has done for a long time. The chip industry has done a lot of amazing things for efficiency, for performance. But a lot of times what you see is optimizations that improve performance by, you know, you get an additional incremental 15% performance improvement and you have an incremental, you know, 25% power cost or something. So we, at that point, we started on this mission, which there's a, you know, a very interesting interstitial period and it ends with founding Efficient.

and taking this new architecture that we developed and doing advanced development and commercialization. And that's where we are now bringing this to market, bringing this architecture to market in the product that we developed at Efficient. Fantastic. So yeah, traditionally it's kind of brute force power to get more stuff done, but you're taking it from a different angle. That's interesting. So some are concerned with the risks associated with over-specialization, right?

in AI processes. So how could this limit future technological advancements? Yeah, so I think specialization is in the toolbox of a computer architect, a very useful tool. Specialization is something we can't avoid it, but it's not the only tool out there. And in a lot of cases, it's not sufficient to solve the whole problem. So the way that I like to think about it is this, we

We talk to customers and we say, you know, tell us about your computing needs. That's the first, it's where we always start. We just want to understand what are you trying to run today? What do you want to run tomorrow? Tell us about your needs, right?

And when we talk to those customers, we get into typically some form of AI. Sometimes it's people that deeply understand machine learning. Sometimes it's people that sounds like maybe their boss's boss told them, sprinkle some AI pixie dust on this application and make it go, whatever. So we have a different set of people we might talk to at different companies. But when we talk to people about these things,

What they say to us is, well, sure, a big part of it is machine learning. A big part of our AI application is machine learning. And that's what's handled by these specialized AI processors. And those specialized AI processors, they actually work great for the part of your AI application that's actually doing machine learning. So, you know, the sort of geekier way to say that is like dense linear algebra software.

I know that's a little, it's from a math textbook, but that sort of thing and convolutional neural networks and matrix multiply, that's what fits well inside of these AI accelerator chips. What you don't hear about as often, but we do hear about very often is the other part of the application.

which is doing data cleanup, regularization, data format conversions, sort of what we might think of as classical digital signal processing algorithms. And those add up to a substantial fraction of like a real world AI application, especially if you look at physically embodied computing, like edge computing, sensors, things like that. So imagine something getting deployed to physical infrastructure,

You have sensors. You're collecting real data from the real world. It's messy out there. So you bring that data in. You need to clean the data up. You need to get it in the format that your ML model expects. And once that's done, then you push it through the ML model and you can get some on-device AI results. So when we talk to customers, like I was saying before, they'll tell us, you know, it might be 50-50. And if it's 50-50 and you go and use your ML chip to solve the problem,

Even if your ML chip is very fast and very efficient, let's say it makes the time and energy go to zero for that part. Well, you're still left with the other half, right? And that means the best you can do end to end is two times better because you're just going to half of what you started with. And this is actually like one of the sort of iron laws of computer architecture. It's called Amdahl's law. It's so important it has a name.

And we're sort of motivated at Efficient by Amdahl's law for efficiency. We're able to support on our architecture, we're able to support the entire application, all of the data ingestion, data cleaning, regularization, DSP, all that stuff I described. We handle that very efficiently and with great performance efficiency.

And we also handle the machine learning, dense linear algebra part really efficiently. And so we capture the entire application. And by doing so, we're actually able to give much higher end-to-end benefit in terms of efficiency. That's really key to what we're doing in Efficient, actually. So forgive my ignorance then. So the chips that you develop are a bit like a hardwired kind of algorithm on the chip. So it's not like a computer chip trying to run a software algorithm.

You mean our chip or these other ML chips? These ML chips, the specialized chips. Yeah, it kind of reduces the demand on the chip to run, if you like, energy-wise.

Yeah, yeah, yeah. So the deal with specialization is this. It's a good way to optimize because what you do when you specialize your hardware is you tailor the hardware to only support, let's say, one algorithm or one category of algorithms. And what you can do is take out all of the little, there's these fidgety little bits all around in your chip if it's configurable, if it's programmable.

And it's basically the programmability bits. And if you look at the way that CPUs have done programmability for like 75 years, it's like back to the origins of like the von Neumann architecture, like 1950s. It's been the same way for a long time. And those configurability bits, those programmability bits, those are quite expensive, actually.

One way to quantify it, this is just something to imagine in your mind. If you're going two gigahertz, then two billion times per second

you are fetching another instruction out of a little memory sitting next to your processor. That's only one part of the overhead, but that's so expensive. Every single cycle you're going and pulling something. It doesn't matter what you do. If you do it 2 billion times per second, it adds up really quickly. And that ends up being pure overhead. So specialization is a way of sort of eliminating that overhead and

doing a single algorithm, like specializing for convolution, if you want to do convolutional neural nets, you eliminate all of the ability to do configuration. And so you can only ever do convolution. And that's what a lot of the NL chips do. And they eliminate all that overhead, but then they have no programmability story. So what's cool about what we're doing at Efficient is we've gotten rid of the von Neumann overheads

but we haven't over-specialized in the way that some of the ML chips have done. So we retain the ability to do programmability and configurability, but we do it in a way that avoids that every single cycle while your computation is running overhead. We eliminate that overhead because we have a new architecture that it's just built differently.

Okay, great. Again, you kind of answered this, but why do you believe general purpose CPUs are better suited for addressing AI's evolving demands than specialized processors?

Yeah. I mean, the boiled down way that I say it, we were at CES this week and it was really fun talking to people about this the whole time that we were there. And part of our graphics there, you have a lot of people walking by and different things capture people's attention. But one that was sort of a repeat in capturing people's attention is we have this sign up. And this is something I really believe at my core from research and then from my experience running efficiencies.

AI, in the way that people see it today, AI is more than machine learning. It's as simple as that. There's more to the story. And so if you use specialized chips...

You are only doing like part of AI per se. There's the rest of the sort of real world aspects to it. And to capture all of the application, if you care about efficiency, if you care about performance for entire real world applications, you have to capture the rest of that stuff more than is just the machine learning. You have to do all of that other, you know, front end processing and then, you know, back end analytics processing too.

So I think generality is really the only way to do that. There's a second thing about this too, which is another kind of big idea around efficient. And that is programmers don't like to change the way they do things. People have done things for a long time. They want to keep using the languages they like to use. So, you know, if you're an embedded developer and you're doing sensor development for let's like space systems or something like that.

you're probably pretty accustomed to the way you do things. Maybe you do some start in MATLAB and then you move your code into C and then you can go and optimize your code in C to run on whatever processor you're going to be using. That might be a very familiar flow. Well, if you're going to use an ML chip, then you have to basically scrap your code and make sure it fits into the idiom of whatever your ML chip does. And usually you have to write some code that sort of patches up between different API calls that are mapped into that hardware.

And so that's been fairly challenging to port an application over. It's not impossible. People do it. But it takes some learning on the part of the developer, there's some friction, there's new tools and libraries and things. One of the things that we're really enthusiastic about around Efficient is providing

the same developer experience. And we want that to be like a very pleasant, delightful developer experience for everyone that uses one of our chips. And so what we're after is minimal to no change to your source code. So if you're an embedded developer and you're going to take your C code and you want to just deploy it on one of our chips, our compiler will ingest your code. We do C and C++. You know, the lingua franca for embedded code

ML today is TensorFlow Lite. And so we have TensorFlow Lite. That's just part of what we support. But those are the ones that come to mind immediately. You know, if you're a developer doing embedded, that's what you're used to using. And it's general. It's like general purpose languages or it's these frameworks that are fairly general.

You don't want to have to rewrite everything in terms of specialized, primitive, or you're sort of contorting your application to fit what the system can do. So we put a lot of priority on focusing on the developer and giving them the ability to use the languages and the systems and things that they use already so that there's a low-friction path to using our chips. Fantastic. That's going to help with the uptake of your technology as well, right?

Yeah, we think so. We think it's going to be good for developers that, you know, like I said, a lot of developers don't like to make changes. They don't want to change how they do things because they've learned. I don't blame them either. It's like if I run into a developer who's doing some algorithm development for an embedded system and I'm like, yeah, it's great. But the only thing is you have to become a digital designer and write Verilog.

It's like they're changing jobs at that point. So that's why I think FPGA is another example of an alternative platform. That's literally what they're asking the developer to do is basically change jobs, become a digital designer, and then you can use our hardware. I don't think that works for a lot of people and it doesn't work for me personally. Fair enough. Yeah.

So building on that, and so there's often a trade-off between energy efficiency and performance, which we've touched on. So how does Efficient Computer overcome this challenge with its fabric processor architecture? Yeah, so the fabric is designed for efficiency. And the way to think about the fabric, I'm going to paint a word picture here. I hope this isn't going to get too complicated. It's fairly simple in the abstraction. So the architecture that we've developed, the way that you could picture it is there's a little grid of squares, like just, you know,

picture a rectangle filled in with those squares and each of those squares can do an operation it's not a core it's like less than a core right so it's not like a prize not like an arm core or something it can do a single operation and so what our what our system does is our compiler will take your program and it will digest it down into what what we call a data flow graph

And each of the operations in your program is represented in that data flow graph. And then there are, you can think of arrows connecting those together. And the arrow means that one operation, when it runs, maybe it's like an addition of two numbers, something like that. It produces a result that goes and acts as the input to some downstream operation. And we call that data flow from one operation to the next one.

So that grid of squares again, we take each of the operations and we pin an operation from your program to each of the squares. And we have a very efficient network that connects up the squares so that we can implement the data flow. And what that does is exposes a very parallel execution of the program across the fabric. It also happens to be extremely efficient because rather than repeatedly fetching instructions,

We compile your program to this representation and then pin the instructions in place. And they stay there and they run for a long time. So we get rid of fetch completely. We get rid of instruction decoding except at the beginning of running a data flow graph. And data movement is

is really fine-tuned. It goes from one operation to another operation directly with very little overhead. We're not going indirectly through memory structures. So that's how we get efficiency. And because of all that parallelism, you have the entire data flow graph mapped in parallel onto our architecture. And all that parallelism gives you a big boost in performance if your application has a lot of intrinsic parallelism in it.

And so together with that, we get a sort of a good blend of, you know, extremely high efficiency. When we look at other von Neumann CPUs in the market, we've seen up to 166 times improvement in energy consumption against competitive industry leading low power embedded CPUs. I mean, that's a, that's a sea change in efficiency. That's like, if you're looking at battery life, that's an extension of 10 to 100 times your battery life.

At the same time, we see a performance improvement because we're exposing more parallelism than like a sequential von Neumann processor would get if you're doing an embedded CPU type implementation. So they act a bit like nodes then, do they basically?

Yeah, we call them tiles. So each of the tiles that make up that grid, it's essentially autonomous. And on each clock tick, a tile decides, do I have all the inputs to my operation? And if it does, then it can fire the operation.

And so things are mostly decoupled. It's actually this, it's this really beautifully elegant abstraction for computation. This is our architecture, this data flow model. And it's this, this beautifully elegant abstraction for computation where operations proceed as soon as they can, as soon as their inputs are available. It's sort of computation the way it should be rather than being artificially sequenced by like,

like the von Neumann execution pipeline. So, you know, we see a good balance of efficiency and performance. You know, we have basically category-defining efficiency and great performance, comparable or better to peers in the market. Fantastic. Okay. And how long can it retain these sections of the code that it stores to kind of set? Sorry, go on.

That's a good question. So the way that this works, if you zoom in another level, is our compiler looks at your program. And now your program, it might have multiple different functions you're calling. It might have multiple loops in a nest. It might have multiple modules. And the compiler can do some reasoning about it. One way of thinking about it is which instructions will tend to fire at the same time as which other instructions? Which instructions sort of group together?

And that might be, if you're doing some nested loops, it might be the entire loop nest. So if you're doing for I, for J, for K, for whatever, that whole chunk, all of those instructions are going to be firing together and probably sharing values and doing some computation. And so we can do the analysis and figure that out. And it creates little partitions in the program. Each of them is a data flow graph that is sort of independently constructed. And that becomes a configuration for the fabric.

And so the fabric will start out and it runs the first configuration. And the fabric is very clever because what it does is when it realizes that it's done or almost done part of a computation, in a pipelined way, it can start configuring the next piece of the computation, sometimes before the previous one is even finished.

And so the computation essentially with that pipeline configuration, you have zero cycle delay between consecutive portions of your program. So they sort of run hand over hand with one another in a really general purpose way. So when we say this is a general purpose processor, it's not like

you know this is a great accelerator tightly coupled to a CPU, it's not like this is a new kind of CPU. This is a really new kind of general purpose processor that runs according to this data flow architecture model that we've developed and it'll run in any code you throw at it. Sounds really slick. Yeah really slick, amazing, okay great.

So what are some real-world applications where the Efficient processor has had the most significant impact, and how do you envisage its broader adoption? Yeah, also a good question. I like to say it this way. Since we've started doing advanced development and commercialization, this is the last three years of Efficient, we've been doing those things. After nearly 10 years of research at CMU, we've been getting a look at the real world. Where do these devices go? Where do chips go when you put them in, you know,

in sensors or in edge computing applications. And there's a few areas where we've seen a lot of traction. So the biggest areas are in infrastructure sensing, infrastructure intelligence, enterprise scale, IOT installations and sensors, those sorts of things. In space and defense applications, we have a lot of traction with space. I,

I would love to talk about space if you have questions about that. Big interest of mine in my capacity at DMU, I still maintain a small space program that we run there. Really a very cool use case for efficient computing. And another area is sort of health and wearable devices. So those are three major areas that we've been looking at. I'd be happy to talk about any of those in some more depth. Yeah, well, since we're on the subject, let's go on with space then, sir. Sure.

Yeah, yeah, yeah. I'm glad you asked. So in space, the most important thing in a lot of space missions is the defense world calls it SWAP, which is size, weight, power. And there, you just need to basically minimize. You want to have the lowest power, the smallest amount of junk you need to launch. You want it to be lightweight.

And when you think about doing swap minimization, that corresponds to launching fewer batteries in a satellite and it corresponds to smaller solar panels in the satellite. Those are major sources of complexity. For example, if you're doing CubeSats and you can go from a deployable solar array

to a purely surface mounted solar array and do the same job. That's a complexity decrease. It's a cost savings. It minimizes the risk of your, it reduces the risk of your mission because now you're not relying on some deployable. The way to do that is to decrease the energy consumption

for whatever you're trying to do. So, you know, one of the things we've looked at is doing earth observation applications. There's a broad range of activities you can do with that. Some on the defense side, you can do climate science. There's all sorts of really useful things, urban planning, smart cities that you could support with satellites, just using visual cameras and doing earth observation.

And so you think about what it takes to do that today. A lot of systems, you know, people are envisioning, oh, we're going to launch three Jetsons inside of a 6U CubeSat. And we're going to use a camera that's going to have, you know, 30 centimeters per pixel. And we're going to transmit things down as quickly as we can. That's a huge energy hog. I mean, that's like...

tens of watts worth of embedded GPUs. And on top of that, you have the cost of communication. You have to nail it with communication because if you miss your window, it could be 45 minutes before you even have a communication window. So the space industry is learning it's better to do a lot of processing on board because of that communication challenge. But the hardware, it's like we're adapting. We're using what's available in market. We're grabbing whatever GPU or whatever fits in the box.

it's kind of unsatisfying because that ups your power consumption. And for a CubeSat mission, that becomes the highest power consumer inside the box as your onboard compute.

That's very unsatisfying because we can do that much more cheaply and we can do more with less. And that means we can add capability to these satellites. So for efficient, one of the things we're looking at is how do we take the DSP computation for the RF stack? How do we take the DSP for visual analysis of images and the machine learning part of those?

There's often some fairly sophisticated backend analytics that can involve sparsely encoded graph problems, things like that, on the data once you've processed those sensor data inputs. A whole slew of things. Satellites do a lot of different kinds of computation. Control loops, for example.

So you put all that together and the efficient chip handles those. That's what it's good at. It's like this diversity of applications, compute intensive. And that's the name of the game when you're in a satellite because you don't have easy access to communication to offload.

So we're a great fit. I'm super excited about the prospect of taking efficient silicon and in the next probably 12 to 24 months seeing something on orbit, seeing efficient silicon on orbit. I can't talk specifically about how we're doing that. I would love to, but I'm very excited about the prospect. We have a few different angles on that. It's a natural fit for the technology that we've developed because of the huge advantage we have in energy and power efficiency. Fantastic.

As you were talking, I was envisaging them being used on deep space probes or areas where there's limited, well, solar energy is limited or weak. It could have applications there, right? I presume. Absolutely, yeah. One area where I'm fairly excited, this is something that we've looked at through the space program at CMU that I mentioned.

is doing visual navigation for lunar missions. So read relative navigation for lunar missions where there's not good infrastructure on the moon because why would there be, right? And so you can use terrestrial infrastructure, but that's very far away. You can use the deep space network, but that's, you know, there's a high cost associated with that and it's difficult to get access and there's a lot of issues. It's great when it works, but you know, you don't

you don't always have access to that. So instead you can do terrain relative navigation or you can do visual only attitude and orbit determination in a lunar environment. That's a very exciting application. There's a lot of downstream use cases there in terms of lunar exploration. And this is just one example. There, when you're on the far side of the moon, you're completely in the dark. So power efficiency is going to be the most important thing. You need to conserve all

all of the energy that you have for the instruments that you're running and for the computation to process the data from those instruments. Otherwise, it's a blackout. There are satellites. We've seen satellite examples. I can't say specifically which ones again, but where you're at a 2% to 10% duty cycle, which means that between 90% and 98% of the time, your satellite's just turned off.

You can't support the power consumption to keep all your stuff turned on. That's very disappointing. So you make this thing more energy efficient and you can keep it on for substantially more than that. Even if the aspirational target is 50%, 30%, you're still substantially better off than before. And so this is the kind of thing, this is representative of the kind of potential in the technology that we've developed. It's not just for the moon. There's plenty that we can do in LEO and closer to Earth applications.

looking down with instruments that can detect different spectra from RF to visual and everything in between. So a lot of really exciting use cases, you know, in civilian defense all over the board. You could have big applications for drones as well, right? Especially in military or as you said, civilian too.

Presumably. Yeah. Yeah, I think that that's not an area we're actively developing as much, except in as much as there's a similarity to the satellite use case. But if you're doing visual processing, you're in an energy constrained environment, you're similarly swap constrained. So really the way to think about it is anywhere that you're highly swap constrained. So if you're flying, absolutely. If you're launching something to orbit, absolutely. Another one is wearable applications in civilian use cases, defense use cases.

Because on the defense side of things, you hear these anecdotes and they're kind of frightening. It's like you have to carry around 35 pounds of batteries with you when you're hiking around. So imagine you've got to go and hike 10 miles to some location with 35 pounds of just the batteries, by the way. That's not even the real equipment. It's just the overhead. So that probably varies by mission. But if we can decrease that to just 25 pounds...

then that makes a big difference. And so a lot of that energy is going into computation, communication. Those are the two things you're spending that energy doing. If we can take the computation part and we can decimate the energy cost, make it be 10% of what it is, make it be 1% of what it is by improving the efficiency, that's a clear benefit. I mean, if you look at it in the other perspective, if you have a smartwatch and you're limited in

in the battery life by the computation happening on the device now it's not just computation i know it's also communication lots of other things but if you just think about the computation part if you're lasting a day two days with your smart watch maybe you know some devices you end up having to plug in every nine hours after you have some battery degradation all the time what if we extend that to a week it's like you forget you have to charge it after a while what if we extend it to a month you forget you completely forget that you have to charge it it's a once a month thing so i'm

It's really a game changer when you think about how the efficiency translates to real-world benefit for these kinds of applications. On the subject of wearables, would it be possible in the future to have, say, a smartwatch? It doesn't require, obviously, a battery, but it can get enough power from your movements and...

to actually power itself with your kind of chip technology? Would that ever be possible? Maybe. I can speculate. So in my career, I have looked at energy harvesting computer systems. At CMU, again, this is going back a few years, but we developed a model of computing that's called intermittent computing. And the idea there is you compute when energy is available in the environment, and when there's no energy available, you sort of shut down and wait for more energy. And with that kind of model,

It depends on what you're trying to do, but you can harvest even power sources as low as ambient radio waves in the environment. So if there's like a megawatt TV tower somewhere in your city, you may in some locations be able to harvest enough energy from those radio waves to charge up a capacitor and then turn on a computer and drain the energy in the capacitor. So I don't know if that's going to be a smartwatch exactly because there's a lot of things you want your smartwatch to do, but

Wearable sensors, wearable health sensors, or even here's another one. This is very exciting. Whether energy harvesting or not, environmental and infrastructure sensors. These things today, you have to go out and swap batteries all the time. Imagine we have batteries in these devices and we improve the efficiency of computation to the point that you almost never want to communicate because computing on device is so much more efficient.

So say we do that. Now we take the, you know, on a AA battery running a machine learning workload on efficient, we can last in the neighborhood of five to 10 years, five plus years. I'll say it depends on, you know, the workload that you're trying to run on a single AA battery. So we supplement that with like,

Imagine supplementing that with energy harvesting to make things last a bit longer. And now you could be pushing a decade on a single deployment. Think about what that does for deploying environmental and infrastructure sensors. You want to go and listen for poachers or we've come across applications that are like listen for wildfires or listen for chainsaws and protected forests or whatever. There's all sorts of applications. Infrastructure, like listen to the pipelines, listen to the power grid.

you can deploy these sensors and you're not upside down in terms of the OpEx. Because if you don't have a long battery life, then you're sending someone out to go and change batteries

you know, every minute of every day. So the OPEX math just doesn't work out. If you make the devices more efficient by improving the efficiency of computing, you make these applications possible for the first time. That's the change here. Efficiency isn't just an incremental improvement. It's like it unlocks this whole category of applications. It's like literally not possible today. That's what gets me so excited about these use cases. That's cool. I mean, you could, if you're able to get there.

You could have a position where a sensor's sensing something and it's getting energy from that, harvesting energy from that, where it would never ever need to be a storage of battery, battery storage at all. Because it's getting energy from what it's sensing.

Yeah, yeah, battery, very less devices. You can harvest energy from the same spectrum you're sensing and you can store up a little bit of energy in a capacitor. No need for a battery at all. I mean, there's a world where the environmental sensors, that's the way they work. Put a small solar panel out there and basically then you're limited by the lifetime of the solar panel. You're limited by the lifetime of whichever component fails first on your board. That's a very interesting category of applications. Yeah.

Now, that's not front and center for efficient yet, but who knows? Maybe that's the future of where these industrial infrastructure sensors go. Today,

typical installations need the reliability of something that's backed by a battery. So what we do is eliminate the need to burn up all your energy sending packets over Wi-Fi. You can compute locally and you save just a huge amount of energy, 10 to 100 times battery life extension compared to other embedded low power CPUs. And I'm talking about running the same code. You don't have to change your application. In fact, you can add features to your application and you can do it in the way that you normally would. So

It's really, yeah, like I said, it's really game changing when you think about the efficiency affecting these applications in the real world. Absolutely, yeah. That's blown my mind a bit. Right. So what have you found have been the biggest challenges in developing and scaling efficient computer and how have you overcome them?

Yeah. So I think if you talk to 10 startup founders, you'll get 10 different answers. For us, I think that we have an amazing team. We have a group of people. I

I feel so lucky and the people that we found are just absolutely amazing at the job that they do. And it's been through a lot of hard work and a lot of, you know, grit and network effect to find the right team to do the work that we're doing. That's a challenge anywhere, but you know, we have a pretty specialist skillset and I think, yeah, the team that we have is really just absolutely crushing it. That's one of the hard parts. Building the right team, definitely one of the hard parts.

And I think we're doing it amazingly well. Like I said, our team is, they're just absolutely fantastic group of engineers. They know hardware, they know software. Many people on the team know the stack from the very top to the very bottom, including the business side. It's very cool. I love working with the people that we have at Efficient. The other side of it is just the sort of course realities of going and bringing a silicon semiconductor product to market. There's just a lot of work and it's a lot of, you know, it's a lot of work.

It's difficult. It's difficult to build something that's going to be useful in the real world. And it's something that, you know, I have a career in the academic world and it's easy to sort of paper over some of these real world concerns when you're on the academic side of things.

To the point that maybe it's a little bit frustrating to see work that does that. But when you get to bringing a product to market, you face those realities and there's a lot to learn because some of these are really fundamental problems. And some of these are things that are just so essentially important to making this valuable in the real world, making this valuable to actual customers.

So, you know, getting your arms around some of these slippery real world issues like specific design constraints, implementation constraints and coordinating all that along the way. That's one of the hard things too. Just making that all come together. Now, something that is fairly unique about Efficient that I think we've done really well and this would be one of the hard parts, except again, I think we have the right team to do this.

is we're a hardware company and a software company. And I think a great recipe to faceplant as a hardware company is to neglect software even for a minute. I think software, and especially for a general purpose computing hardware, software is just

So, so critically important. Our compiler team has been, you know, working from the very beginning. Some of our first hires were in the compiler area. We've been developing some of the abstractions that ended up in the compiler for, you know, better part of a decade, starting all the way back in the research. That software ecosystem, that's what makes the hardware actually go. This is, I think, one of the biggest challenges in general for trying to field a general purpose software.

semiconductor product is getting the software story right. I think that our team is just absolutely world-class and is crushing it at doing that. But that and easy either. I would say that's something that if you're out there thinking of doing a hardware company, make sure that you have your software story sorted out. Fantastic. All right. So the next question with AI evolving rapidly, what do you think the next major shift in computing technology will be and how is efficient computer preparing for it?

So this will sound familiar from what I've been talking about, but I think that we're in the era of heavy specialization. We've seen a lot of efforts go into designing products.

fixed function accelerators, highly specialized accelerators and accelerators that work for, you know, one part of AI again, like AI writ large. There's a lot more to it than just ML. Um, and I, I think that what we're going to see, you know, in the next five years, maybe 10 years is a shift to look at what are all the other hard parts of computation. You know, we nailed it on matrix multiply, like hooray, uh, uh,

We, you know, mission accomplished on matrix multiply. I think we got it, guys. But I don't think that everything begins and ends with matrix multiply. I don't think everything begins and ends with convolution. Like I was saying before, when we talk to customers, they tell us a very different story. It's, you know, a lot of it is we need to do, you know, irregular analytics and DSP and all these other computations. So I think what we're going to see is a pendulum swing away from hyper-specialization.

And when we start to do that, I think the winner is going to be the one that is thinking the most about energy. And that's self-serving. I appreciate the fact that our mission is generality and efficiency. But I think that it's the natural consequence of swinging back from specialization because specialization eliminates all of that configurability and it drives basically performance and energy to the maximum and it eliminates configurability.

If you come back the other way and now you want to support a broader array of computations, you're either bringing together many accelerators. I don't think that's the right model because that increases complexity and it will increase energy consumption because you have to move data between those accelerators. So what you're left with is the third option. And this is what Efficient is doing. I think that this will be the problem of the future is energy efficiency and the generality to avoid the trap of over-specialization.

I think that's going to be the problem for the next five years, maybe 10 years, maybe beyond.

is going to be the only thing that matters. We're seeing this in the data center. Like this is becoming clear in the data center as people are talking about, you know, scratching their head and saying, do we need a new nuclear power plant to go with this data center? I mean, that is so shocking to hear that kind of speculation that that's what we're going to need in the future. Whether that's true or not, I don't know. Maybe we need to scale up to the point to support all of the applications we need to run where that ends up being a good idea.

That seems extravagant and enormous, but maybe that's what's actually required. But I think especially as you know, let's move out of the data center and move into the edge and into sensor devices and then sort of these tiny devices that are physically embodied and disappear into the world. The ones like we've been talking about, their energy matters more than ever because you're typically constrained entirely by energy.

And I'm talking about batteries. I'm talking about, you know, far away installations of things. And so energy is going to be just the most important thing across the board. Energy, I think we're in the era of computing where energy is the only thing that matters.

Yeah. So yeah, you're probably going to see a big explosion, obviously with AI, but large data centers needing more and more power. It's going to get to a point and then your energy efficiency will become more important after that period. You'll have an explosive starting period as AI really embeds into society. And then there'll be more of a push or focus on the energy efficiency side, right? Presumably. Yeah.

Yeah, I think so. I mean, I think we're there. We're at the cusp. And this has been our guiding principle is to be motivated by energy efficiency. I think people are now, we're seeing this through, you know, Pocket of Media even is covering, people are seeing the true cost of all of this amazing AI magic. And it comes in the form of energy.

And I think that that is something we're going to have to reckon with, especially, you know, as you just said, and as I was saying before, we see more physically embodied use cases for computation, including, you know, machine learning and AI, but lots of other things too. We get these applications out there. They're not going to consume less energy. It's going to be more.

And so we need to keep pushing the envelope on energy efficiency. You know, the first era of computing going way back was figuring out how do we do general purpose? Then after that, we got into an era of how do we push performance? Now we're in pushing performance through specialization. We get some efficiency, but it runs out. We need generality. And when we go back to generality, it's going to be energy. That's the only thing that matters.

Yep. Yep. Very true. Um, right. Last question then, um, bit off topic, uh, but as a professor and startup CEO, CEO, what advice would you give to researchers and entrepreneurs who want to bridge the gap between academia and industry? Oh, well, I think that I have learned, uh, an enormous amount from my career in the academic world.

I have loved being in the academic world and I continue to love being in the academic world. It's a delightful place to work and there's so much to learn and so many creative people.

I have taken lessons from there, like the ability to take risks and work under uncertainty and to sort of port those skills over into the startup world and sort of mold them to something that works on the side of industry. And likewise, I think that running a company and doing advanced development and commercialization has helped me to understand problems in the real world and understand, you know, what, what does the world care about? What do people care about? What really matters for applications and to bring that in. And it, it,

it sort of, it sheds light on my academic interests and even the kinds of problems that I select today to work on and sort of research in the academic world is influenced by that. So I would say, you know, if you're an aspiring academic thinking about, you know, should I do a startup? First of all, make sure this is not, you know,

Yo, I have a billion dollar app idea. Make sure it's actually a good idea. Make sure it's something that you really believe in. I can say wholeheartedly efficient is something that I really believe in. I think that what we're doing is different. I think it is highly differentiated. I think we're going to change the landscape of computing with what we're doing. That's what motivated me.

to go and start this company with Nathan and Graham and Alex Hawkinson, our fourth co-founder. That's what motivated us to take the plunge and to dive in and make this happen. If you feel that, you should do it. There's ways to make it work. I mean, honestly, reach out to me personally. If I have a minute, I'd love to send you a note and give you my two cents for where you're at in your career and thinking about it. But more broadly, find someone near you that can help you decode the riddle of how do I turn this

sort of maybe a little bit esoteric and not quite product ready academic idea into something that you could use in the real world, make it into a company. There's a lot of resources out there. There's a lot of people that want to help you learn. And, you know, as with anywhere, I think most people want most people to succeed. Like most people you talk to, they want most, like it seems sometimes like you're getting beat over the head and the world is adversarial. But I think if you go and do a poll, I think most people actually want most other people to succeed because

Doesn't always happen that way, but most people are positive in that way. And so I think if you reach out to mentors, if you reach out to friends who understand this world, don't be afraid to do that. People, more than you might imagine, are willing to give you, even 15 minutes might help you to get oriented. So just take the plunge. If you really believe in an idea, take the plunge.

Something I should have added is, do you recommend it? Which probably should have been added to that question. Yeah, I mean, it's been a relearning experience. It has ups and downs, like everything I've ever done has had ups and downs. I think one way I like to say it is I have a three-year-old son and a three-year-old company. And I don't know if I would recommend doing that kind of staging of major life events.

Yeah. But it was a lot to take out at once. But, you know, of course I love my family. I'm a total family man. I love Efficient and I love what we're doing here. So, you know, no regrets. But it's, yeah, it definitely is a big time sponge across the board. I would say both of these things, all of these things are a big time sponge. Just be ready for that. That's something I would say if you're thinking about it also. Be ready for that. Absolutely. Advice for us all, I think. Yeah.

It's the end of my questions. Is there anything else you would like to add that we haven't discussed? No, I mean, just I think, you know, what we're doing at Efficient, like I said before, is we're doing something that's really new. I think what we're doing is massively differentiated in terms of the technology itself. The architecture is fundamentally different and the benefit that we bring, you know, that one to two orders of magnitude improvement in energy.

This is what real world applications need, especially these sort of physically embodied intelligence applications. So I'm really excited to see our work, you know, putting this technology out into the world. We're going to have silicon available mid-year and for our early access customers, that's going to be landing. Contact me if you're interested. That's addressed to your audience. I hope that's okay. We're, you know, accepting customers into our early access program now, and we're going to be pushing products end 25, early 26 into the broad market.

Very excited to see that happen and to see the use cases start to pile up and see real value in the real world from our technology. Very excited about that. Fantastic. Very best of luck to you. I hope it succeeds. Basically, yes, with that, thank you for your time, Brandon. That was very interesting. Yeah, thank you very much. I appreciate you having me on your show. Our pleasure. And that concludes this episode of Lexicon. Thank you all for tuning in and being our guest today.

Follow our social media channels for the latest science and technology news. Also, don't forget to subscribe to IE Plus for premium insights and exclusive content.

AI's Energy Crisis: Innovations Shaping the Future of Computing 49:25 Share

Lexicon by Interesting Engineering

Deep Dive

Shownotes Transcript

AI's Energy Crisis: Innovations Shaping the Future of Computing