We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

AWS CEO Matt Garman Talks AI Roadmap

2025/5/30

Bloomberg Talks

AI Deep Dive AI Chapters Transcript

People

Matt Garman

Topics

Matt Garman: 作为AWS的CEO，过去一年我最兴奋的是客户在AI领域的快速创新和对新技术的积极采用。客户正在加速云迁移，尤其是在AI和代理技术方面，越来越多地将他们的整个环境迁移到AWS。AWS的AI业务已经达到了数十亿美元的规模，但这仅仅是AI变革的开始。我相信，未来每个企业、行业和工作都将被AI彻底改变。AWS的生成式AI收入已经达到了数十亿美元，并且亚马逊内部也在广泛使用AI来优化运营和改善用户体验。客户正在利用AWS来彻底改变他们的联络中心，并使用AWS的定制芯片或NVIDIA处理器构建自己的模型。随着时间推移，AI工作负载中推理所占比例越来越高，未来将占据绝大多数。推理是AI嵌入到每个人使用的应用程序中的方式，每个应用程序都将内置推理，就像计算、存储和数据库一样。AI将嵌入到用户体验中，成为应用程序的核心部分，从而提高效率、能力和用户体验。Project Rainier是与Anthropic合作构建的，用于训练其下一代云模型的最大计算集群。Anthropic将在Tranium 2上训练其下一代模型，这是一个为AI工作负载定制的Amazon加速处理器，并且我们正在构建有史以来最大的集群之一。Tranium 2服务器已经投入运营，Anthropic正在使用部分集群，其性能在绝对性能、成本效益和规模方面都非常出色。目前AI仍然过于昂贵，我们需要通过芯片、软件和算法创新来降低成本，使其在更多领域得到应用。Tranium 2与NVIDIA并非竞争关系，市场空间巨大，NVIDIA是一个强大的平台，AWS与NVIDIA是设计合作伙伴。AWS确保为客户提供最新的NVIDIA技术，并不断突破NVIDIA能力的极限，同时也为Tranium等其他技术留有空间。客户需要选择，AWS的职责是为客户提供尽可能多的选择，Tranium和其他技术有很大的发展空间。AWS已推出P6实例（基于NVIDIA Grace Blackwell），客户正在使用并对其性能感到满意，AWS正在迅速提高产能。AWS的目标是成为运行各种工作负载的最佳场所，包括Anthropic Claude模型，AWS致力于提供最先进的技术能力和最广泛的服务。客户选择AWS是因为AWS能够帮助他们优化成本，并提供最可用、最安全的平台，例如Mondelez将其传统Windows平台转换为Linux应用程序，从而节省了许可成本。AWS致力于成为最具技术能力、拥有最多样化服务的平台，并乐见其他公司提供服务。AWS鼓励所有合作伙伴在其他地方提供服务，并希望其他公司也能采取同样的策略。AWS正在拉丁美洲积极扩张，包括推出墨西哥区域，宣布在智利建立新区域，以及在巴西拥有一个受欢迎的区域。AWS正在欧洲扩张，并计划在今年年底推出欧洲主权云，这是一个专为关键欧盟主权工作负载设计的独特能力。考虑到人们对数据主权的担忧，特别是对于政府和受监管的工作负载，欧洲主权云将是一个非常受欢迎的机会。

Deep Dive

Chapters

This chapter explores the massive growth of Amazon's AI business, specifically focusing on AWS's contribution. It delves into the mix of customer-run models, hosted models like Amazon Bedrock, and applications such as Amazon Q. The discussion also highlights the transformative potential of AI across various industries and jobs.

AWS's AI business is in the multi-billion dollar range and is primarily driven by customers using AWS.
This revenue includes a mix of customer-run models, hosted models, and applications.
AI is expected to fundamentally transform every business, industry, and job.
Inference is becoming the dominant AI workload, surpassing training in usage.

Shownotes Transcript

Translations:

中文

This is an iHeart Podcast. Thrivent can help you plan your finances for the people, causes, and community you love. What makes Thrivent different? Financial services and generosity programs are combined to help you build a financial roadmap for the future, while also creating opportunities to give back along the way. Visit Thrivent.com to learn more. Thrivent, where money means more. Bloomberg Audio Studios. Podcasts, radio, news. ♪

Welcome to our Bloomberg Radio and television audiences worldwide. We go right now to a conversation with Matt Garman, AWS CEO. Matt, it's good to catch up. It has been basically one year that you've been in the role as AWS CEO. As a place to start, what has been the biggest achievement in that time for AWS?

Yeah, thanks for having me on. It's nice to be here again. Yeah, it's been a fantastic year of innovation. It's really been incredible. And as I look out there, one of the things that I've been most excited about is how fast our customers are innovating and adopting many of the new technologies that we have.

And as you think about customers that are on this cloud migration journey, many of them have been doing that for over the last several years. But this year in particular, we've really seen an explosion of AI technologies, of agentic technologies, and increasingly we're seeing more and more customers move their entire estates into the cloud and AWS. So it's been really fun to see. It's been an incredible pace of technology, and it's been a really fun first year.

The moment that investors kind of sat up and paid attention was when Amazon said that its AI business was at a multi-billion dollar run rate in terms of sales. What we don't understand as well is what proportion of that is AWS infrastructure?

Yeah, that is AWS, right? And so the key is that's a mix of customers running their own models. Some of that is on Amazon Bedrock, which is our own hosted models where we have first-party models like Amazon Nova, as well as many of the third-party models like Anthropix models. And some of those are applications, things like Amazon Q, which helps people do automated software development.

as well as a host of other capabilities. And so there's a mix of that. And I think part of the most interesting thing about being at a multi-billion dollar run rate is we're at the very earliest stages of how AI is going to completely transform every single customer out there. And we talk to customers and we look at where the technology landscape is

And we firmly believe that every single business, every single industry, and really every single job is going to be fundamentally transformed by AI. And I think we're starting to see the early start, the stages of that. But again, we're just at the very earliest stages of, I think, what's going to be possible. And so that multi-billion dollar business that we have today is really just the start. Can you give me a generative AI revenue number?

For the world or for AWS? For you guys, for AWS. Maybe Amazon as a whole. Yeah, like I said, we are in multiple billions of dollars and that's for customers using AWS. We also use lots of generative AI inside of Amazon for a wide range of things. We use it

to optimize our fulfillment centers. We use it when you go to the retail site to summarize reviews or to help customers find products in a faster and more interesting way. We use AI in Alexa, in our new Alexa Plus offering, where we conversationally talk to customers through the Alexa interface and help them accomplish things through voice that they were never able to do before. So every single aspect of what Amazon does leverages AI.

AI and our customers are exactly the same. Customers are looking to AWS to completely change, whether it's their contact centers through something like Amazon Connect, where it shows AI capabilities so that you don't have to go program it, all the way down to our custom chips or NVIDIA processors or anything where customers at the metal are building their own models. We have the whole range of people that are building AI on top of AWS, as well as Amazon themselves.

We always credit AWS as being number one hyperscaler. But just what you said there about what the client's using in the silicon level through to capacity, it would really help if you could proportionately tell me what percentage of workloads are being run for training and which proportion of workloads are being run for inference. Sure.

Yeah, and that changes over time. I think, look, as we progress over time, more and more of the AI workloads are being inference. I'd say in the early stages of AI, in generative AI, a lot of the usage was dominated by training as people were building these very large models with small amounts of usage. Now, the models are getting bigger and bigger.

but the usage is exploding at a rapid rate. And so I expect that over the fullness of time, 80%, 90%, the vast majority of usage is going to be in inference out there. And really, and just for all those out there, inference, it really is how AI is embedded in the applications that everybody uses. And so as we think about our customers building, there's a small number of people who are going to be building these models.

but everyone out there is gonna use inference as a core building block in everything they do. And every application is gonna have inference and already is starting to see inference built in to every application. And we think about it as just the new building block. It's just like compute, it's just like storage, it's just like a database. Inference is a core building block. And so as you talk to people who are building new applications,

They don't think about it as AI is over here and my application is over here. They really think about AI is embedded in the experience. And so it's increasingly, I think it's going to be difficult for people to say what part of your revenue is going to be driven by AI. It's just part of the application that you're building. And it's going to be a core part of that experience. And it's going to deliver lots of benefits from efficiency, from capabilities, and from user experience for all sorts of applications and industries.

But present day, it's fair to say majority is still training? No, I think that at this point, definitely more usage is inference than training. We want to welcome our radio and television audiences around the world. We're speaking to AWS CEO, Matt Garman, who officially next week celebrates one year in that role leading AWS. A new metric...

that has been discussed, particularly this earnings season, we discussed it with NVIDIA CEO Jensen Wong this week, is token growth and tokenization. Has AWS got a metric to share on that front? I

I don't have any metrics to share on that front, but I think it's one of the measures that we can look at is the numbers of tokens that are being served out there, but it's not the only one. And I increasingly think that people are going to be thinking about these things differently. Tokens are a particularly interesting thing to look at when you're thinking about text generation, but not all things are created equal. I think particularly as you think about AI reasoning models, the input and output tokens don't necessarily...

talk about the work that's being done. And increasingly, you're seeing models that can do work for a really long period of time before they output tokens. And so you're having these models that can sometimes think for hours at a time, right? They might, you ask these things to go and actually do research on your behalf. They can go out to the internet, they can pull information back, they can synthesize, they can redo things. If you think about coding and QDeveloper,

We're seeing lots of coding where it goes and actually reasons and does iterations and iterations and improves on itself, looks at what it's done, and then eventually outputs the end result. And so at some point, kind of the final output token is not really the best measure of how much work is being done. If you think about images, if you think about videos, there's a lot of content that's being created.

and a lot of thought that's being done. And so tokens are one aspect of it, and that's an interesting measure, but I don't think it's the only measure to look at, although they are rapidly increasing.

Project Rainier, massive custom server design project. What is the operational status and latest on Project Rainier? Yeah, so we're incredibly excited about it. So Project Rainier is a collaboration that we have with our partners at Anthropic to build the largest compute cluster that they'll use to train their next generation of their cloud models.

And Anthropic has the very best models out there today. Cloud 4 just launched, I think it was last week. And it's been getting incredible adoption out there from our customer base.

Anthropic is going to be training their next version of their model on top of Tranium 2, which is Amazon's custom-built accelerator processors, purpose-built for AI workloads. And we're building one of the largest clusters ever released. It's an enormous cluster, more than five times the size of the cluster compared to the last one that they trained on, which again is the world's leading model. So we're super excited about that.

We're landing Tranium 2 servers now and they're already in operation and Anthropic is already using parts of that cluster. And so super excited about that and the performance that we're seeing out of Tranium 2 continues to be very impressive and really pushes the envelope, I think, on what's possible both from an absolute performance basis as well as a cost performance and scale basis. I think some of those are equally going to be really important as we move forward in this world.

'Cause today, much of the feedback you get is that AI is still too expensive. The costs are coming down pretty aggressively and it's still too expensive. And so we think there's a number of things that need to happen there. Innovation on the silicon level is one of those things that needs to help bring the cost down.

as well as innovation on the software side and algorithmic side so that you have to use less compute per unit of inference or training. So all of those are important to bring that cost down to make it more and more possible for ADI to be used in all of the places that we think that it will be over time.

Matt, on Wednesday, NVIDIA CEO Jensen Wong summarized inference demand for me. I just wanted to play you that soundbite. Sure. Well, we've got a whole bunch of engines firing right now. The biggest one, of course, is the reasoning AI inference. The demand is just off the charts. You see the popularity of all these AI services now.

Your pitch for Tranium 2, and as you know, I've kind of taken apart the server design and looked at it, is the efficiency and cost efficiency relative to NVIDIA tech. Are you seeing that same demand Jensen outlined for Tranium 2 outside of the relationship with Anthropic?

Yeah, look, we're seeing it across a number of different places, but it's not really Trinium 2 versus NVIDIA. And I think that's not really the right way to think about it. I think there's plenty of room. The opportunity in this space is massive. It's not one versus the other. We think that there's plenty of room for both of these. And Jensen and I speak about this all the time, that NVIDIA is an incredibly...

fantastic platform. They've built a really strong platform that's useful and is the leading platform for many, many applications out there. And so we are incredible design partners with them. We make sure that we have the latest NVIDIA technology for everyone. And we continue to push the envelope on what's possible with all of the latest NVIDIA capabilities. And we think there's room

for Tranium and other technologies as well, and we're really excited about that. And so we have many of the leading AI labs are incredibly excited about using Tranium 2 and really leaning into the benefits that you get there. But for a long time, these things are going to be living in concert together. And I think there's plenty of room and customers want choice. At the end of the day, customers don't want to be forced into using one platform or the other. They'd love to have choice and our job at AWS is to give customers as much choice as possible.

What is general availability of NVIDIA GB200 for AWS? And have you, I guess, launched Grace Blackwell-backed instances yet? Yes. Yep. So we've launched our, we call them P6 instances. And so those are available in AWS today. And customers are using them and liking them. And the performance is fantastic. So those are available today. We're continuing to ramp capacity today.

We work very closely with the NVIDIA team to aggressively ramp capacity and demand is strong for those P6 instances. But customers are able to go and test those out today. And like I said, we're ramping capacity incredibly fast all around the world and in our various different regions. Matt, what is your attitude to Claude Anthropix model being available elsewhere on Azure Foundry, for example?

Great. I mean, that's okay too. I think many of our customers make their applications available in different places. And we understand that various different customers want to use capabilities in different areas and different clouds. Our job is to make AWS, and this is what we do, is to make AWS the best place to run every type of workload. And that includes anthropic cloud models,

but it includes a wide range of things. And frankly, that's why we see big customers migrating over to AWS. Take somebody like a Mondelez, who's really gone all in with AWS and moved some of their workloads to there. One of the reasons is that they see that we have capabilities, sometimes using AI by the way, in order to really help them optimize their costs

and have the most available, most secure platform. In Mondly's case, they're taking many of their legacy Windows platforms and transforming them into Linux applications and saving all of that licensing cost. But we have many customers who are doing that, and so our job is to make AWS by far the most technically capable

a platform that has the most and widest set of services. And that's what we do. But I'm perfectly happy for other people to use, like, it's great that Claude's making their services available elsewhere. And we see the vast majority of that usage happening in AWS though. Will we see open AI models on AWS this year? Well, just like, you know, we encourage all of our partners to be able to be available elsewhere. I'd love for others to take that same tack.

Let's end it with this, a question from the audience actually, which is where you're going to grow data center capacity around the world. I got a lot of questions from Latin America and Europe in particular where Jensen flies to next week.

Great. So in Latin America, we're continuing to expand our capacity pretty aggressively. Actually, earlier this year, we launched our Mexico region, which has been really well received by customers, and we've announced a new region in Chile. And we already have, and for many years, have had a region in Brazil, which is quite popular and has many of the largest financial institutions in South America running there. So across Central and South America, we are continuing to rapidly expand our

In Europe, we're expanding as well. We have many regions already in Europe. One of the things I'm most excited about actually is at the end of this year, we're going to be launching the European Sovereign Cloud, which is a unique capability that no one has, which is completely designed for critical EU-focused sovereign workloads. And we think

given some of the concerns that folks have around data sovereignty, particularly for government workloads as well as regulated workloads. We think that's going to be an incredibly popular opportunity for everybody.

Matt Garman, AWS CEO, thank you very much. Thank you for having me. Thrivent can help you plan your finances for the people, causes, and community you love. What makes Thrivent different? Financial services and generosity programs are combined to help you build a financial roadmap for the future while also creating opportunities to give back along the way. Visit Thrivent.com to learn more. Thrivent, where money means more. This is an iHeart Podcast.

AWS CEO Matt Garman Talks AI Roadmap 15:25 Share

Bloomberg Talks

Deep Dive

Shownotes Transcript

AWS CEO Matt Garman Talks AI Roadmap