This is an iHeart Podcast. Thrivent can help you plan your finances for the people, causes, and community you love. What makes Thrivent different? Financial services and generosity programs are combined to help you build a financial roadmap for the future, while also creating opportunities to give back along the way. Visit Thrivent.com to learn more. Thrivent, where money means more. Bloomberg Audio Studios. Podcasts, radio, news. ♪
Welcome to our Bloomberg Radio and television audiences worldwide. We go right now to a conversation with Matt Garman, AWS CEO. Matt, it's good to catch up. It has been basically one year that you've been in the role as AWS CEO. As a place to start, what has been the biggest achievement in that time for AWS?
Yeah, thanks for having me on. It's nice to be here again. Yeah, it's been a fantastic year of innovation. It's really been incredible. And as I look out there, one of the things that I've been most excited about is how fast our customers are innovating and adopting many of the new technologies that we have.
And as you think about customers that are on this cloud migration journey, many of them have been doing that for over the last several years. But this year in particular, we've really seen an explosion of AI technologies, of agentic technologies, and increasingly we're seeing more and more customers move their entire estates into the cloud and AWS. So it's been really fun to see. It's been an incredible pace of technology, and it's been a really fun first year.
The moment that investors kind of sat up and paid attention was when Amazon said that its AI business was at a multi-billion dollar run rate in terms of sales. What we don't understand as well is what proportion of that is AWS infrastructure?
Yeah, that is AWS, right? And so the key is that's a mix of customers running their own models. Some of that is on Amazon Bedrock, which is our own hosted models where we have first-party models like Amazon Nova, as well as many of the third-party models like Anthropix models. And some of those are applications, things like Amazon Q, which helps people do automated software development.
as well as a host of other capabilities. And so there's a mix of that. And I think part of the most interesting thing about being at a multi-billion dollar run rate is we're at the very earliest stages of how AI is going to completely transform every single customer out there. And we talk to customers and we look at where the technology landscape is
And we firmly believe that every single business, every single industry, and really every single job is going to be fundamentally transformed by AI. And I think we're starting to see the early start, the stages of that. But again, we're just at the very earliest stages of, I think, what's going to be possible. And so that multi-billion dollar business that we have today is really just the start. Can you give me a generative AI revenue number?
For the world or for AWS? For you guys, for AWS. Maybe Amazon as a whole. Yeah, like I said, we are in multiple billions of dollars and that's for customers using AWS. We also use lots of generative AI inside of Amazon for a wide range of things. We use it
to optimize our fulfillment centers. We use it when you go to the retail site to summarize reviews or to help customers find products in a faster and more interesting way. We use AI in Alexa, in our new Alexa Plus offering, where we conversationally talk to customers through the Alexa interface and help them accomplish things through voice that they were never able to do before. So every single aspect of what Amazon does leverages AI.
AI and our customers are exactly the same. Customers are looking to AWS to completely change, whether it's their contact centers through something like Amazon Connect, where it shows AI capabilities so that you don't have to go program it, all the way down to our custom chips or NVIDIA processors or anything where customers at the metal are building their own models. We have the whole range of people that are building AI on top of AWS, as well as Amazon themselves.
We always credit AWS as being number one hyperscaler. But just what you said there about what the client's using in the silicon level through to capacity, it would really help if you could proportionately tell me what percentage of workloads are being run for training and which proportion of workloads are being run for inference. Sure.
Yeah, and that changes over time. I think, look, as we progress over time, more and more of the AI workloads are being inference. I'd say in the early stages of AI, in generative AI, a lot of the usage was dominated by training as people were building these very large models with small amounts of usage. Now, the models are getting bigger and bigger.
but the usage is exploding at a rapid rate. And so I expect that over the fullness of time, 80%, 90%, the vast majority of usage is going to be in inference out there. And really, and just for all those out there, inference, it really is how AI is embedded in the applications that everybody uses. And so as we think about our customers building, there's a small number of people who are going to be building these models.
but everyone out there is gonna use inference as a core building block in everything they do. And every application is gonna have inference and already is starting to see inference built in to every application. And we think about it as just the new building block. It's just like compute, it's just like storage, it's just like a database. Inference is a core building block. And so as you talk to people who are building new applications,
They don't think about it as AI is over here and my application is over here. They really think about AI is embedded in the experience. And so it's increasingly, I think it's going to be difficult for people to say what part of your revenue is going to be driven by AI. It's just part of the application that you're building. And it's going to be a core part of that experience. And it's going to deliver lots of benefits from efficiency, from capabilities, and from user experience for all sorts of applications and industries.
But present day, it's fair to say majority is still training? No, I think that at this point, definitely more usage is inference than training. We want to welcome our radio and television audiences around the world. We're speaking to AWS CEO, Matt Garman, who officially next week celebrates one year in that role leading AWS. A new metric...
that has been discussed, particularly this earnings season, we discussed it with NVIDIA CEO Jensen Wong this week, is token growth and tokenization. Has AWS got a metric to share on that front? I
I don't have any metrics to share on that front, but I think it's one of the measures that we can look at is the numbers of tokens that are being served out there, but it's not the only one. And I increasingly think that people are going to be thinking about these things differently. Tokens are a particularly interesting thing to look at when you're thinking about text generation, but not all things are created equal. I think particularly as you think about AI reasoning models, the input and output tokens don't necessarily...
talk about the work that's being done. And increasingly, you're seeing models that can do work for a really long period of time before they output tokens. And so you're having these models that can sometimes think for hours at a time, right? They might, you ask these things to go and actually do research on your behalf. They can go out to the internet, they can pull information back, they can synthesize, they can redo things. If you think about coding and QDeveloper,
We're seeing lots of coding where it goes and actually reasons and does iterations and iterations and improves on itself, looks at what it's done, and then eventually outputs the end result. And so at some point, kind of the final output token is not really the best measure of how much work is being done. If you think about images, if you think about videos, there's a lot of content that's being created.
and a lot of thought that's being done. And so tokens are one aspect of it, and that's an interesting measure, but I don't think it's the only measure to look at, although they are rapidly increasing.
Project Rainier, massive custom server design project. What is the operational status and latest on Project Rainier? Yeah, so we're incredibly excited about it. So Project Rainier is a collaboration that we have with our partners at Anthropic to build the largest compute cluster that they'll use to train their next generation of their cloud models.
And Anthropic has the very best models out there today. Cloud 4 just launched, I think it was last week. And it's been getting incredible adoption out there from our customer base.
Anthropic is going to be training their next version of their model on top of Tranium 2, which is Amazon's custom-built accelerator processors, purpose-built for AI workloads. And we're building one of the largest clusters ever released. It's an enormous cluster, more than five times the size of the cluster compared to the last one that they trained on, which again is the world's leading model. So we're super excited about that.
We're landing Tranium 2 servers now and they're already in operation and Anthropic is already using parts of that cluster. And so super excited about that and the performance that we're seeing out of Tranium 2 continues to be very impressive and really pushes the envelope, I think, on what's possible both from an absolute performance basis as well as a cost performance and scale basis. I think some of those are equally going to be really important as we move forward in this world.
'Cause today, much of the feedback you get is that AI is still too expensive. The costs are coming down pretty aggressively and it's still too expensive. And so we think there's a number of things that need to happen there. Innovation on the silicon level is one of those things that needs to help bring the cost down.
as well as innovation on the software side and algorithmic side so that you have to use less compute per unit of inference or training. So all of those are important to bring that cost down to make it more and more possible for ADI to be used in all of the places that we think that it will be over time.
Matt, on Wednesday, NVIDIA CEO Jensen Wong summarized inference demand for me. I just wanted to play you that soundbite. Sure. Well, we've got a whole bunch of engines firing right now. The biggest one, of course, is the reasoning AI inference. The demand is just off the charts. You see the popularity of all these AI services now.
Your pitch for Tranium 2, and as you know, I've kind of taken apart the server design and looked at it, is the efficiency and cost efficiency relative to NVIDIA tech. Are you seeing that same demand Jensen outlined for Tranium 2 outside of the relationship with Anthropic?
Yeah, look, we're seeing it across a number of different places, but it's not really Trinium 2 versus NVIDIA. And I think that's not really the right way to think about it. I think there's plenty of room. The opportunity in this space is massive. It's not one versus the other. We think that there's plenty of room for both of these. And Jensen and I speak about this all the time, that NVIDIA is an incredibly...
fantastic platform. They've built a really strong platform that's useful and is the leading platform for many, many applications out there. And so we are incredible design partners with them. We make sure that we have the latest NVIDIA technology for everyone. And we continue to push the envelope on what's possible with all of the latest NVIDIA capabilities. And we think there's room
for Tranium and other technologies as well, and we're really excited about that. And so we have many of the leading AI labs are incredibly excited about using Tranium 2 and really leaning into the benefits that you get there. But for a long time, these things are going to be living in concert together. And I think there's plenty of room and customers want choice. At the end of the day, customers don't want to be forced into using one platform or the other. They'd love to have choice and our job at AWS is to give customers as much choice as possible.
What is general availability of NVIDIA GB200 for AWS? And have you, I guess, launched Grace Blackwell-backed instances yet? Yes. Yep. So we've launched our, we call them P6 instances. And so those are available in AWS today. And customers are using them and liking them. And the performance is fantastic. So those are available today. We're continuing to ramp capacity today.
We work very closely with the NVIDIA team to aggressively ramp capacity and demand is strong for those P6 instances. But customers are able to go and test those out today. And like I said, we're ramping capacity incredibly fast all around the world and in our various different regions. Matt, what is your attitude to Claude Anthropix model being available elsewhere on Azure Foundry, for example?
Great. I mean, that's okay too. I think many of our customers make their applications available in different places. And we understand that various different customers want to use capabilities in different areas and different clouds. Our job is to make AWS, and this is what we do, is to make AWS the best place to run every type of workload. And that includes anthropic cloud models,
but it includes a wide range of things. And frankly, that's why we see big customers migrating over to AWS. Take somebody like a Mondelez, who's really gone all in with AWS and moved some of their workloads to there. One of the reasons is that they see that we have capabilities, sometimes using AI by the way, in order to really help them optimize their costs
and have the most available, most secure platform. In Mondly's case, they're taking many of their legacy Windows platforms and transforming them into Linux applications and saving all of that licensing cost. But we have many customers who are doing that, and so our job is to make AWS by far the most technically capable
a platform that has the most and widest set of services. And that's what we do. But I'm perfectly happy for other people to use, like, it's great that Claude's making their services available elsewhere. And we see the vast majority of that usage happening in AWS though. Will we see open AI models on AWS this year? Well, just like, you know, we encourage all of our partners to be able to be available elsewhere. I'd love for others to take that same tack.
Let's end it with this, a question from the audience actually, which is where you're going to grow data center capacity around the world. I got a lot of questions from Latin America and Europe in particular where Jensen flies to next week.
Great. So in Latin America, we're continuing to expand our capacity pretty aggressively. Actually, earlier this year, we launched our Mexico region, which has been really well received by customers, and we've announced a new region in Chile. And we already have, and for many years, have had a region in Brazil, which is quite popular and has many of the largest financial institutions in South America running there. So across Central and South America, we are continuing to rapidly expand our
In Europe, we're expanding as well. We have many regions already in Europe. One of the things I'm most excited about actually is at the end of this year, we're going to be launching the European Sovereign Cloud, which is a unique capability that no one has, which is completely designed for critical EU-focused sovereign workloads. And we think
given some of the concerns that folks have around data sovereignty, particularly for government workloads as well as regulated workloads. We think that's going to be an incredibly popular opportunity for everybody.
Matt Garman, AWS CEO, thank you very much. Thank you for having me. Thrivent can help you plan your finances for the people, causes, and community you love. What makes Thrivent different? Financial services and generosity programs are combined to help you build a financial roadmap for the future while also creating opportunities to give back along the way. Visit Thrivent.com to learn more. Thrivent, where money means more. This is an iHeart Podcast.