We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

How Amazon Rebuilt Alexa From The Ground Up — With Panos Panay and Daniel Rausch

2025/3/5

Big Technology Podcast

AI Deep Dive AI Chapters Transcript

People

Daniel Rausch

Panos Panay

Topics

Panos Panay: 我们重新架构Alexa的历程漫长而复杂，主要原因在于我们必须在保留现有数亿用户良好体验的基础上，从底层进行重新设计。这就好比在行驶的汽车上更换引擎，既要保证汽车继续行驶，又要升级引擎性能。我们不能让用户感受到任何功能的缺失，同时还要实现更强大的功能。这需要时间，也需要谨慎的规划和执行。我们面临的挑战是双重的：一方面要维护现有数亿用户的体验，另一方面要从底层架构进行重新设计。如果我们从零用户开始，这个过程会快得多。但现实情况是，我们必须在不影响现有用户体验的情况下，完成底层架构的升级和新功能的添加。我们最终的愿景是打造一个能够理解自然语言、拥有记忆、能够个性化定制，并最终变得非常有用的对话式智能助理。为了实现这一愿景，我们必须进行全面的重新架构，而这需要时间和耐心。 Daniel Rausch: Alexa的重新架构是一项巨大的工程壮举。我们使用了最新的几代大型语言模型，例如我们自己开发的Nova模型，以及与Anthropic的合作成果。大型语言模型使Alexa能够处理不确定性，实现更自然的对话，但同时也带来了巨大的挑战。我们需要确保Alexa在执行诸如“锁门”或“播放歌曲”等指令时，能够以可预测且确定的方式运行，同时又要保留大型语言模型的灵活性和细微差别。我们的系统并非简单的开关机制，而是由多个模型和专家系统协同工作。大型语言模型作为底层架构，负责选择合适的模型来完成任务。然后，不同的专家系统（例如，智能家居专家、娱乐专家、信息专家等）会根据用户的请求，协同工作来完成任务。每个专家系统都有自己的模型，它们之间会相互协调，最终以最有效的方式完成用户的请求。我们使用了混合专家模型，这是一种先进的技术，它能够提高速度和准确性，并降低计算负担。通过这种方式，我们能够保证Alexa在处理简单任务时能够快速响应，同时也能处理更复杂、更细微的任务。

Deep Dive

Chapters

This chapter explores the reasons behind the lengthy development of Alexa Plus. The immense user base and the need to maintain existing functionality while re-architecting the system from the ground up are highlighted as major factors contributing to the extended timeline.

Hundreds of millions of active users needed to be considered during the re-architecture.
Maintaining existing functionalities was crucial, alongside the development of new features.
A complete system re-architecture was necessary to incorporate large language models.

Shownotes Transcript

Translations:

中文

The Amazon leaders who spearheaded the new Alexa are here in studio to talk about what it took to rebuild the pioneering AI and where voice AI is headed in the age of large language models. That's coming up right after this.

Hey, you. I'm Andrew Seaman. Do you want a new job? Or do you want to move forward in your career? Well, you should listen to my weekly show called Get Hired with Andrew Seaman. We talk about it all. And it's waiting for you, yes you, wherever you get your podcasts.

Struggling to keep up with customers? With AgentForce and Salesforce Data Cloud, deploy AI agents that know your customers and act on their own. That's because Data Cloud brings all your data to AgentForce, no matter where it lives. Get started at salesforce.com slash data.

Welcome to Big Technology Podcast, a show for cool-headed, nuanced conversation of the tech world and beyond. We're joined today by Panos Panay, Amazon Senior Vice President of Devices and Services, and Daniel Rausch, Amazon's Vice President of Alexa, for a fascinating conversation about what it took to rebuild Alexa effectively from the ground up. Gentlemen, so great to see you. Welcome to the show.

Thanks, man. So great to be here. It sounded kind of fun. You both must be relieved to have this out. Yeah. I mean, excited. Relieved is a tricky word on this one. You know, we're finishing the product now. It's coming out next month. So we're pumped that we're through the event. And yeah, there's some relief, I would say. Would you agree? You feel a little bit of relief, but the truth is like it's all about getting it in a customer's hands as fast as possible. Yeah.

So you still the team's feeling that urgency right now. Yeah, that's the big moment for the team, right? You get that first customer response. So we're we still feel like we're building towards it. But yesterday was great. Okay, so I have three echo devices in my house. We have three rooms. Yeah, what are they? House is generous. But in my apartment, there's one in the bedroom. There's one in the kitchen slash dining room. And there's one in the office. Yeah. So they are first generation. I'm really looking forward to getting these updates working, hopefully within these devices and

and getting a chance to use a new and improved Alexa. I've been hanging on to the Echos for a long time in the hope that something like this would happen. So we're here. And I was at your event where you were announcing it. I'll give listeners a little bit of understanding of what I saw. And then we're going to go into some questions about what it was like to build this. So this new Alexa, it's called Alexa Plus.

It is conversational, so it understands natural language. It understands your context. And you don't have to say Alexa every time. We'll sort of have a back and forth with you. It is – I think you could call it agentic. It allows you to take action like book a table, call an Uber. It will go out in the world and help monitor ticket prices for you, for instance.

And it's also deeply integrated into Amazon services, namely Prime. It's going to be free for Prime members, $19.99 a month if you're not a Prime member. And the coolest thing I saw in the demo was that I think one of you asked for the song with, what was it, Bradley Cooper and Lady Gaga? I didn't say Lady Gaga. I just said Bradley Cooper. Bradley Cooper. What was the movie called?

Stars Born. Stars Born. Great movie. Great movie. And then it called up, play the song, and then you said, now let me see it in the movie, and it connects to Prime Video, and you could see it in the movie. So very cool product. Definitely, I think, what a lot of us have been hoping to see from the Alexa team and from Amazon on Alexa. We're going to talk a little bit about what it took to build it and then the strategy here. So

I think the first question I need to ask you both is, what did take so long? Because I think that for all of us who've, I think there's 500 or 600 million Alexa-enabled devices out there, we've been wondering as open AIs of the world and other companies have made these big advances on voice AI when Amazon was going to make its move, and you have made the move, but what was the process that made it take as long as it has, Panos? I think the easiest way to say it is,

When you have hundreds of millions of customers that are active right now, I mean, we talked a little bit about it yesterday, but every one of them matter. How do we make sure they all get the great experience they need? Meaning you can't start from zero and ignore it. And if you could, it could be much faster, although it's not that easy to hook up the thousands of APIs and all the partners that we're bringing together and all the experts. It takes time. But the first thing is,

There's two parts to it. But the first thing is you got hundreds of millions of customers. They love certain things that they do on Alexa today. They might not love everything, but they love certain things for sure. You can't leave that behind. Can't wake up one day and whatever you use Alexa for, whether it's timers or music, you can't not make it better and great. And so you don't feel like something was taken away from you. When you take something away from a customer,

You've just missed. You've missed. And so that's one. It takes time to make sure you can get it all done. So everything on what you would call Alexa, not Alexa Plus, works on Alexa Plus, but better.

And that was just the first point, part of the vision, can't leave anyone behind, which was important. We can talk about devices and so forth, but customers who love their products that are in and they need them, we can't take that away. That was one. Second piece is you're re-architecting from the ground up. So you've got first the weight of keeping hundreds of millions of customers, and then you're re-architecting from the ground up.

If we started from zero customers, I think this is a different story. You can move a lot faster. We can solve problems and then just add features as we go, if that makes sense. So maybe we just had a conversationalist, a pretty cool one. Then we can add personalization. Then we can add memory. Then we can add the experts. And people would just get updates along the way and maybe learn and be great. However, on day one, we need to support everything people love and know about Alexa, day one. And so a little bit of patience there.

And it takes a little bit longer and the vision was it the vision the vision was clear like we're gonna go Bring a conversational agent forward an assistant for everyone that is smart has memory can personalize to you and then ultimately be incredibly useful and so we when we had that laid out we're okay great, but we can't leave any customers behind and right at that point you kind of step back and

Once you put the vision together, you realize you need a full re-architecture, but you're not going to leave your customers out. So you're re-architecting pretty much two stacks at that point. One, what is classically known as Alexa, to be awesome and come into this conversational world.

And the other is everything new that it has to do. Yeah, and I want to go a level deeper with Daniel on this one because, Panos, what you're talking about, a re-architecture, is sort of what I've heard has been the holdup here with Alexa for all these years, which is that – and Daniel, tell me if I'm wrong. But basically what folks have told me is that the old version or the original version of Alexa was built –

With a lot of like if-then commands, right? So, you know, it will understand some structured commands. Turn on the lights. Okay, then it will take that and almost like deterministically say, okay, I understand this command. This is what I'm going to do. Turn the switch with large language models, right?

It's a completely different ballgame because you have to make room for uncertainty. So actually, the fact that you've been able to introduce an Alexa with large language models, which I think we'll be able to keep that functionality as an engineering feat, that's my perspective from the outside. What is it actually like on the inside and how close is that assessment to the challenge? Well, the team would love to hear you say engineering feat because I do think that's the –

It is real. That is the size of the task for sure. I think you're onto it for sure. You know, large language models, the one thing I'd add just in terms of thinking through the technical architecture to what Pano said is that it's really just the latest generations of large language models that can even

do the things that Alexa needs to be able to do. So you're talking about our Nova models, right? Which we announced within the last few months and starting to get into customers hands. That's super exciting. You know, partnership that we have with Anthropi, like you really need very state of the art technology at the base of the architecture in those large language models.

And in large part because of what you said. We need them to behave in ways that we can predict and are certain. Someone says, lock my door or play that song. You want it to happen, right? Some are higher consequence than others and you really need to get it right. But you also want all the elegance and nuance and understanding and non-deterministic behaviors right?

of large language models themselves, right? So we would call that a stochastic system that, you know, it's literally at runtime that you're making those determinations. So if you want to integrate tens of thousands of services on day one, day one out of the box,

take advantage of everything that Alexa's always been able to do, as Panos was saying, and introduce all of this new unbelievable behavior that you can get out of large language models, that is a big engineering feat. So how does it know when the user is saying turn the lights on versus like,

something more esoteric like is there something built within the technology that's kind of like a switcher that determines first your intent and then decides which part of the model to send it out to the way to think about it is you know at at the base level you have large language models and you have this model agnostic system that's even itself going to choose the right model for the job and the models play different roles in there what what's already happened is um

even honestly, sort of in the way you asked a few of the questions, is that people assume the large language model is the product. A product like Alexa is.

so much more than quote-unquote just a large language model. So you have models playing many different roles in the system overall, even models helping us decide which model and models themselves deciding if they're the best tool for the job, so to speak. So then you have a system that progressively decides how to get something done. I wouldn't think about it like a switch or something in classic computer science that is a gate. That's not how the system works. It's

It's a collection of model behaviors and systems downstream of that that complete specific tasks. And that's where we introduced this term expert to try to help coalesce around

the system behavior and explain it better. The large language models are interacting with these experts that do things like get you the sports score, play a song, play a video, know where you are in the song so that you can go to the video, like all the things that you saw yesterday at the event. And so Panos, this is a mixture of experts model. It is. You think about it in a mixture of experts model, but each expert theoretically has its own model as well. So you're building on top of it. Each expert is smarter.

When you think experts, it's a weird term, yeah? But there's think photos, smart home, entertainment, whether that's music or video, local info, info, all the partners that connect. You have communication expert. You have an artifact expert. You have a memory expert. You have a personalization expert. Each of them play a role, and they kind of arbitrate with each other at all times.

So like the model is just lighting up when it determines that that's what you want to do. That's right. Daniel kind of said it well because the LLM at the bottom of that stack is – it's deterministic. It's choosing which model to use. Then the experts come into play on top of it. It's a pretty phenomenal way to –

It's a pretty interesting way to think about it. This is a mixture of experts model for those at home. It's been part of what DeepSeek has used to become much more efficient in its reasoning, for instance, because instead of lighting up the entire large language model, it's deciding to light up certain areas that might be... I mean, it's not a DeepSeek innovation, but they've just kind of used it to an extreme extent. Has that...

Using that architecture helped you build this in a way that's, for instance, like reducing latency or sort of lightening the compute burden that you otherwise might add? If you want something incredibly fast, stable...

Even secure, like the paths on data, right, where you're really taking care of customers. This is the fundamental approach, I think, that is state-of-the-art. And accurate. And for sure accurate. Don't forget accurate. So important. Yeah. But on that note, I mean, are the –

The new Alexa, is there going to be some sacrifice to having those Alexa commands, those standard turn the lights on, set the alarm, in order to enable all the LLMs to work the way that they're going to? I think you just called out the sacrifice and it's time. Okay. How long it's taken us to get to where we are. That's why it's my favorite question. Like, why is it taking you so long? Like, if I told you where we were four months ago on somebody said, lock that door.

And then we had to determine what that meant versus in the past, lock my front door. And you had to know it was the front door and you had to say front door. It's pretty phenomenal. But, you know, six months ago, it took longer than anyone would wait to lock a door. And, you know, our customers need immediate response and we won't make that tradeoff. So to be that accurate with the latency that's needed with the speed sub two seconds at the end of the day.

You end up needing a little bit more time refining the expert so the expert can be quicker and the model can pick the right model quicker and the smaller model can be trained to make sure it knows where the door is. He gave an example earlier which I thought it's a nuance but let me just share it with you. Previously in Alexa, you couldn't say "play that song". It would look for a song called "that". It was that simple. Now the model has to reason and say "that song".

I wonder what he's asking. I wonder what she's asking. I wonder what the person's asking. That's what's happening in the system. Then the expert shows up, looks at the history, the personalization. What conversation were we having? Play that song. Oh, he's talking about the conversation we just had about Bradley Cooper and Lady Gaga. Shallow, shallow play. That all happens in, you know, sub-division.

2000 you know how many milliseconds are we talking we count in single milliseconds now in so component so now you're all that is going on in the stacks working through it versus today which is play Shallow and that's the only way you're gonna play shallow. Yeah, that's it and so I think it's just understanding that nuance In where natural language comes in where you can talk to the you can talk to Alexa

Without being precise, just like you can talk to me and I'll use some micro tells to get, you know, are you asking me a rude question, a great question, a nice question? Are you leading me? And then from those micro tells, I can then move to the words and then determine where you're taking me.

And you don't have to write it down, type it, and read it exactly. All that is happening now in the machine, which is pretty powerful. There was a cool scene in your demo at the launch event where I think, Panos, it was you where you said, don't play the music in the baby's room. Yeah. So it's really – I didn't say that. So that's very explicit too, right? Don't play the music in the baby's room.

It will, the model will come up, the expert will show up, the music expert is where it's super powerful and go, got it, play it everywhere else. Or you can just say, don't wake the baby, play the music everywhere. Then the model will go, don't play it in the baby's room. I know what, I know what they're asking. So this is where that just that small model in the expert does its job.

And the fact that you can just naturally move it around in that demo. I don't know if you noticed, by the way, nerve wracking. Yeah. So for listeners, Panos did this entire demo live. I mean, we're going to talk about Apple intelligence in a second, but Apple intelligence. I was at the WWDC launch event and it was all a vision. And what we saw at this Alexa launch event was a working demo and

Now, look, I mean, we know to reserve, us commentators know to reserve complete judgment until it's in our hands. Yeah, you have to. You have to, for sure. But it was real. It was all real. Real and working. Yeah. But what makes you nervous in an event like that, you're not worried about the product working. I mean, six months ago, I would have worried about the product working. And I would have shown you more vision demos, like videos. But the product's working. The challenge is the infrastructure problem.

The thousands of Wi-Fi signals that are pinging around that room. Like, it's just an unusual. These live environments are very unusual. Turns out tech reporters like tech and they're using a lot of it.

We're all on the Wi-Fi. More, more. I mean, the signals that are being pulled from Bluetooth to Wi-Fi to, I mean, who knows what's in pockets? One of my favorite tech demo moments is Steve Jobs just losing his shit on stage because all the reporters are connected to Wi-Fi. And he's like, you can either be connected to Wi-Fi or you can have a demo. You pick.

Totally. I mean, I'm not, you know, I think bloggers have a right to blog, but if we want to see the demos, we're not going to be able to do it unless we turn off all these Wi-Fi base stations and laptops, set them on the floor. Yes. We didn't have to have that situation. And then you got, you know, the servers have to be lit up and you're worried about latency and what's happening in the room. So you got all that going on and now you're going to do live and this is your baby, right? I mean, you love what you're about to show. You love it. And if it doesn't go off, like,

I don't want to tell you what the backup plan was. What was the backup plan? We're not going to talk about it. For real? Let's not talk about the backup plan. Let me just say. You can't tease the backup plan and then just share the backup plan. They were really good. They were really good. It was not a great backup. No, it was. They were great. They weren't going to work, but they were great plans, I would say. I'm looking over here at some of the team that was helping yesterday. But during that moment, I was...

You may have heard, it was very nuanced. At one point I said, move the music, bring the music here. I want to hear the music over there. And the reason I use different sentences, I know what the model's going to reason over and do, but I wanted to make it clear, like, you don't have to think about what you want to happen. You just have to talk. I want the music over there. Okay. And if the model doesn't know, or if Alexa doesn't know, she'll ask you,

Do you mean in the living room? Yeah. So are we going to have a speed tradeoff here from the traditional Alexa tasks? Just quickly, Daniel, I'm curious. Is the stuff I was doing beforehand, like I'm doing now, set an alarm? Is it going to take a little longer because of this process or it'll be the same amount of time? No. I mean, this is where we have such a high bar before we're willing to put it out. And deterministic systems are incredibly fast.

It is straightforward computer science in this day and age with an AWS cloud and the great connectivity that everyone has in their homes to make a deterministic system fast on something exactly like you said. Making a non-deterministic system fast that can respond in any way, gathers all the context, figures out legions of different things which experts can invoke. Making that system fast on something as simple as an instruction is hard. That is quite hard. What technological development

breakthroughs or innovations did you rely on to get it from a place where you were dissatisfied with latency to a point now where you're happy? I think it's another version of using the right tool for the job and building a system that's frankly just more complex overall to get the simple things done. So there's an irony in that, but you need a system that

creates very fast paths for simple things. Even though you started with an incredibly complex system already, you're adding these kinds of complexity to get simple things done. So that, I mean, I won't go into the specific technical details here, but that's the upshot. You need to be able to figure out you're trying to do something simple so that you can do it fast with a very complex system. And it gets tricky. You know, people understand how to speak to Alexa today. I think our new customers, we want to, you know, and current customers, we want to open their minds on what they can ask for.

and how to get something done. Take the simple tasks that we have, timers, alarms. There's a different way to think about them. And then in the non-deterministic world, how to translate what's being said into what's being asked for, which is different. An example, you said how quick we'll be setting an alarm. It'll be lightning fast and you'll likely set it the way you always have. I need an alarm, set an alarm for 8 a.m. I think that's the classic way to set an alarm.

Or you can say, Alexa, I need to wake up tomorrow at 8. Okay. And now that's non-determinist. And now it's going, I think you need an alarm. And then it'll offer you an alarm. Or just set it. Same with the timer. Set me a timer.

By the way, how long do you want the timer for? You say the time. You can move that to set me a timer to I'm cooking my steak medium rare. And then she'll say I'm setting you a timer for six minutes. And so you understand like when you get into that natural language, non-deterministic, what's happening? What are you asking for? You're cooking your steak. Okay, I'll get you six minutes on each side. Or tell me how thick it is.

And then the answer is, you know, two inch thick, whatever. Or I want a ramen egg. That's eight minutes. I got you. Tell me when you start. I'm starting. Eight minute timer started for you. And so the world just changed from even these most simple tasks. It just changes in the spirit of, by the way, I never knew how long it took to cook a ramen egg. So I'd always have to go to TikTok, open it, spend 20 seconds watching somebody make ramen eggs.

And then eventually it says, put it in the water for eight minutes. Like, got it. That's all you see on TikTok for the next week. And then I would say, yeah. It's very true. By the way, don't search ramen eggs. You'd get hammered with ramen eggs. But I think – and then all of a sudden you're like, got it, eight minutes. Set a timer for eight minutes. Now just change it. Just ask for ramen egg and Alexa will just determine what you're looking for and give you an eight-minute timer. Okay. So just to wrap this section on the technical side. Yeah.

My note that I wrote to myself that said they spent too much time building the Alexa microwave and the Alexa alarm clock and not focusing on the technology. Maybe I underestimated the technological lift here a little bit. I don't know. We can't determine what you were thinking for sure. But I think there's a lift here. You said it's a feat of engineering. That's where you started. We have one of the best teams on the planet working on this.

A lot of it has 10 years of history in it. You know, there's so many people that work on Alexa today that have been there since its inception. You've got a lot of passion around that in the engineering team and the product, you know, just the product team all up. We call product makers when you put them all in a collection.

And yeah, it's a feat. It's okay though. It doesn't matter if somebody thinks it should be easier or it's not easier or whatever. It doesn't matter. Actually, if it feels like it's easy, that sounds pretty good to me. I mean, I don't mind. It means the customer is happy. Like this must have been easy. Like, yeah, okay. I don't care. Do you like it? Like, do you love it? Great. And that's, I think that's where we go. So I want to talk about the vision of this product because, and the strategy that you're going to

put into play here because again I was sitting in the audience and I talked about Apple intelligence before I guess this segment of our conversation is I've headlined it's Apple intelligence but it works hmm

And, you know, it's a little facetious. I tried not to read anything you posted coming in today because I was like, oh, no, I don't want to defend or have a preconceived notion. So that's interesting. You have to keep sharing. We've been talking on the show a lot about how, you know, and yeah, just we talked a lot about the buildup to WWDC, the reveal. And it was a...

it seems like every big tech company has almost the same vision. And tell me if I'm wrong here. But like Apple was like, the Apple intelligence demo was like, you talk to Siri and ask when your flight is and you're switching flights and it's helping you pick your kids up. And that demo looked a lot like the Google assistant demo that I've seen like almost every year at Google IO. And then I saw your demo and I was also just like,

This is a similar idea, which is that it's a contextually aware, smart AI assistant that helps you get things done and makes your life easier. So I'm curious if you both see the competitive landscape in the same way I do, if there's something different about Alexa than the others, and how you plan to win given the landscape is developing the way it is.

You want to jump in? I got a long one here. So why don't you just start and then I'll go. I mean, look, the vision for Alexa has been super consistent, actually, for 10 years. I think Panos, it made it into your final deck, I believe, yesterday. We have always wanted to just make lives easier and better, simpler, and be the world's best personal assistant. That's been the vision for Alexa from the beginning.

And so now we just have a technical leap that lets us get closer to that vision. But nothing, you know, that's been the vision since for all 10 years that Alexa has been out there. We have a much more capable vision.

AI assistant that's conversational, that is personal and personalized now, that can get an incredible amount of things done for you. But the vision is consistent. I want to go to Panos in a second, but I need to follow up on that because the reaction to this reveal has been,

This is great. It's personalized. It has your data to help you figure things out. But then you look at a company like Apple, which has so much personal data that people have trusted Apple with because it's all it has this security messaging or Google, which, you know, has your, you know, maybe your Gmail, your Google calendar, Google Maps. This is these are the services that you use to get around the world and interact with people. Yeah.

So, if you're going to be this personalized assistant, like, you are coming up against these companies that basically have already been deeply integrated into people's daily routines. So, what is the play there? I mean, the phone, you're basically asking about the role of the phone. Not just the phone because Google has plenty of services on the desktop. I mean, I'm on an Apple machine. I got…

Gmail open maps to figure out how to get your calendar and

So it's almost the operating system for your life. I mean, look, you told us you have Echos in every room in your home, and that's great. That's also true. I'm starting to think maybe I have too much tech. Well, look at your job. I mean, come on. If you didn't, this would be a problem. I'm just saying, customers, you know, we do so much for customers in the home today. And, of course, we're Amazon, so that's not just thank you, by the way, for having Echos in every room in your home. That's awesome. But also we probably put some –

packages on your doorstep and probably stream you some content. And we've got great deep relationships with our customers. Prime is an incredibly valuable program, for example. And, you know, hundreds of millions of customers literally take value in that and love it and use it all the time. So

We love our relationship with our customers, too, and think that we can deeply integrate any services customers want as well. We work with Gmail. We have the Outlook calendar. We integrate Apple calendar. I think it's a very powerful point. You have to take that and understand we're both kind of a – we have this, if you will, you have music, shopping, movies. This is real things that people love doing in the home. I mean, these are personal at every level. Photos.

But also, but we're such an open platform with thousands of partners. It's hard to say it's a platform, so I'd be careful with the word. But at the end of the day, every single integration point across Alexa gives us so many of those insights as well. But the key, Daniel hit it when he asked you a question. It might have been rhetorical at some level. I don't think there's anyone close to

to be able to understand your home as Amazon, as Alexa. It's a super important element for us, Alex. The idea that smart home is connected to your music, to your entertainment, to your life, the fact that we're now bringing in memory to Alexa and you can have that conversation, it'll hold the context for you. I don't think there's anything else like it because then it's connected to all your services in a natural way too.

I don't think it replaces the centerpiece of the phone. I think it just adds value to your life in a very different way. And I think there might be a little bit of opportunity, and this is me understating it, but the ambient devices in your house right now and the ones that you can buy from us and some of the beautiful products that we're both making now and have released recently, they're in your home. And you don't have to think. You don't have to open anything. You don't have to log into anything. You just have to be there and speak.

And it's a powerful concept when natural language shows up.

Yeah, I was speaking with Jamil Ghani, the head of Prime at your event yesterday, and he was talking about how the family calendar is on his Alexa device, and it is a Google calendar. So the fact that there is that interoperability, I think, where you don't have a phone, that actually might – maybe that's an advantage. I'm just trying to – It is an advantage. Just think of it this way. Like we're not asking you to start something that you knew that you don't already do.

Right. We just want to make it simpler for you. So Google Calendar is a great example. Okay. Just attach all four of your family's calendar. We'll make it a family calendar and put it front and center for you. And then when you decide if you're going to dinner on Friday night, we'll rationalize it. And, you know, that concept that there's a communal device in your house that everyone can see, you know, it's something that people have been asking for for a long time. But now that you have so much intelligence in the product and it can do the rationalization for you,

Yeah, I feel like we stand alone there. I do think I would I think this calendar example is one that helps flip the question a little bit in my mind because it really is like how often do you say well it was just on my calendar. I didn't know to meet you there. Why? I was on my work calendar. I say that to my wife Tali, you know, all the time. She's like we missed the restaurant. We missed the reservation.

So anyway, having one spot that can be communal and personal, pretty powerful. I want to press a little bit on this because the phone seems to be the place where people like it's all about like where do people interact with these assistants? Yep. The phone seems like it's going to be a pretty important place. It will be. So if you don't have a phone, I mean, again, there's some advantage in that like you can bring any service in. But like if people are like on a

an Android and they're summoning a Google Assistant, whatever the name is that week, or they're on an iPhone and they're summoning Apple Intelligence or Siri. Where does Alexa fit in on that? Are you going to have to look at deeper integrations with these phone makers? Will they even allow you to do that? I think people use different assistants. I don't think there's any question about it. I don't think there's one.

Although if you lean into Alexa, we have the Alexa app on the phone. And with one touch of the button on your iPhone, you're having the same conversation. You're actually carrying the conversation from your home to your phone, to your car, to your PC with Alexa.com.

We thought that through because we needed that thread for sure. So, you know, as she becomes more personal to you and then, you know, more needed, you want to have her with you everywhere. That app is doing a crazy cool job right now. And we haven't released the new Alexa app yet. It's coming with, if you get Alexa Plus, you get the Alexa app, the Alexa Plus app.

as well as Alexa.com. Alexa.com, right? There's going to be a web version of this. There is. And you'll just see the more traditional long form work that you do with any AI browser at this point. It's the easiest way to say it. But you also get all the personalization. You also get the context of carryover. If you had a conversation in your kitchen, it'll just remind you what conversations you've had lately. If you've booked a reservation, whatever you've done, it'll collect it there. So it'll be on your PC and your phone as well. So I think we just want to

provide that for our customers so they have the opportunity to say, I want my assistant, my single assistant with me everywhere. You might use your phone for different things. You might use a different AI assistant on your phone. I think that's a fair, you know, fair proxy. I don't, I wouldn't disagree. It just depends on what's the best path to get something done. I think Alexa will provide a lot of that best path.

Okay, I want to take a quick break and then talk a little bit about the agentic elements in your new Alexa release, where agents might be going. And then maybe we dream a little bit about where this technology is going to lead. We'll be back right after this.

Struggling to meet the increasing demands of your customers? With AgentForce and Salesforce Data Cloud, you can deploy AI agents that free up your team's time to focus more on building customer relationships and less on repetitive, low-value tasks. That's because Data Cloud brings all your customer data to AgentForce, no matter where it lives, resulting in agents that deeply understand your customer and act without assistance. This is what AI was meant to be. Get started at salesforce.com slash data.

And we're back here on Big Technology Podcast with two Amazon executives responsible for the new Alexa. We have Panos Panay here. He's Amazon's Senior Vice President of Devices and Service. And Daniel Rausch is Amazon's Vice President of Alexa and Fire TV. So it's interesting that this...

The agentic buzzword is now starting to be translated into things that we're seeing in product. And it's kind of interesting because Alexa has had skills for a while, like call me an Uber. And now you can use Alexa to call you an Uber. So is this actually like a really a new moment for agentic AI or is this rebranding of some stuff that works a little better than it has? Panos, what do you think? I can't get it to work anywhere else.

I mean, I think this is a, at the end of the day, it's, it's incredibly new, but it's also solving so many different things at the same time. First, um, you have to always go back to how much understanding is, is in an utterance, just in natural language, being able to translate it. We've talked about this already, getting down to calling a service, calling the right API part, making the right partnership. So that API is called to make it as simple as possible. Um,

It's uh, I don't think it's been accomplished. I don't think you're seeing it out there anywhere connected to an assistant right now I think there's a lot of maybe I maybe you've seen it you got to share with me where it is, but I

I don't think you have, I have not. - What, agents? - Fundamentally, like using a core LLM with an agent, non-deterministic, calling the right API, calling that service, booking that service, bringing it back and tying it back into all your other services. - It's a demo we've all seen a thousand times but haven't been able to use, I think, as consumers. - Yeah, okay. I think, yeah, maybe that's the case. I haven't seen those demos myself, but I do, I believe it, I believe it. I maybe just need to watch closer.

But I do think it's new. I think it's new. What we've created and what we're doing and building it up, I think it is. I also think we mean, we might mean different things by agent. And so I'm just curious, Alex, what do you, what, just to make sure we're grounded in your definition. For sure there's a grounding difference between us. I highlight, just in passing, I mentioned yesterday in my own part of our event, you know, that, boy, everyone just uses this term agent. And I do think people use it in different ways. What does it mean to you?

Yeah, it's such a great question because I do think that in some ways that agent has been used to rebrand automation. We've been seeing automation demos forever. I mean, even – so just to give you one example, I wasn't trying to shade the Amazon demo. Just to give you one example. Yeah. We were all – I mean, a lot of folks watching the tech world were at Google I.O. when they demoed a –

voice assistant that will go, it will call a restaurant for you and book you a table. And like they did the actual conversation and the, the assistant has like human utterance goes, um, well maybe we could have a table for, and it's like, and then it would actually go and book you the restaurant. I don't, I just don't remember using it, but correct. So again, there's the demo, there's the demo and then there's real life. And, but I think it was also just like you gave a tech command and it would go out and do that for you. Um, but I,

A lot of this stuff, like I said, we've seen demos. We haven't seen it actually work. My definition for agent is something that can go out and accomplish for you. So...

You had a good demo that I enjoyed watching about trying to go see a Red Sox-Yankee game. By the way, for folks listening, the reveal event was in New York. Daniel's apparently a Red Sox fan. He trolled the entire audience, including the guy sitting directly in front of me. There was a guy wearing a Yankees cap. It was almost like he planned it. I kept saying, are you sure you want to do this Red Sox bit? He's like, for sure. He goes through the entire off-season acquisitions, which –

Alexis. I mean, as a Mets fan, I will say. You were fine. You were fine. By the way, you saw that. You saw the info expert in action right there. That's what it was. Yeah, because you're now and it was it was not deterministic. And then, of course, it's a different answer every time Alex, every time Daniel did the demo. At the end of the day, I mean, it was Alexis decision to talk about Alex Bregman. It wasn't Daniel's like you couldn't lead that. You can't.

You can't plan that. And so a bit of a risky demo because if Alexa decided not to talk about Bregman, I don't know where you would have taken the rest of the call. I do know a lot about the Red Sox, so I figured, you know, maybe eventually we get to buy some tickets is what I was thinking. But it was to set an example of that kind of agentic capability and sort of set the baseline of what we mean, which is, hey, I actually was just having a chat about the Red Sox.

could I get some tickets? Actually, that's a tough game to get. Oh, they're expensive. Can you watch for tickets for me? I mean, that was where we ended up with the demo. Could have ended up in a lot of different places, but being able to set an agent off, if you want to call it an agent in that case, we think about it a little bit differently. But in that case, that agentic capability to say, first of all, I could buy you these tickets right now. Second of all, you don't like the price. I'll watch for you. Infinite patience,

Never runs out of gas. If those tickets do drop below a certain price, I'm notified and can buy them. That's a hugely useful thing for a customer. Yeah, and you can buy it with a command. Yeah. Because you're integrated with Ticketmaster. Exactly. Yeah, so to me, I would say that's...

agentic behavior. Great. I would say it qualifies. We had some questions in a, we have a big technology discord. I was like sharing notes with the, with the crew as the event was going on. And we had some notes from people about what they, they want sort of beyond those simple use cases. I mean, I call it simple, but you know, obviously there's a tech, there's a, there's a tech lift to get it done. So yeah,

One of our listeners said, is Alexa still going to be reactive to requests or can it be proactive and suggest at the start of the day some smart ideas based on the context that Amazon has? For instance, I would say, you know, do I need to order any birthday gifts?

And it would then go out and say, well, look on your calendar, there are five birthdays coming up. These are the dates and these are our suggestions. So is it going to – because that's, I think, a step further. I think you're stepping in – you said you want to talk a little bit about the future and how proactive Alexa can be. Like there's a balance. One, we think Alexa can be incredibly proactive, like to the point of when you wake up in the morning, you walk into the kitchen. It's like, Alex, you didn't sleep well, you know.

And then you can imagine integration with some partners that is like, okay, let's have the conversation. Also say, hey, your day looks pretty packed today. You should probably find some time. That proactivity is there. It's in the system. We're using it in a very different way. We don't want to be intrusive with it. We got to learn from our customers first. Like how much proactivity do you want?

I think it's very, very important to, you know, you don't want to jump to that future. You got to be right. So, yeah, it's a good example. I wake up in the morning and if I need to buy a birthday gift, can you just remind me? We can create reminders. We can create a conversational piece. But I don't think a lot of people want Alexa just to wake up and start talking to you. No, I do think that, yeah. Don't want to be intrusive. You got to be really careful. We got to be so smart about, you know, we have 10 years of lessons. This is what's so awesome about it. And...

you know, how much privacy matters and when you want to invoke Alexa to be part of the conversation versus when you, how proactive you want it to be. And, you know, we have a balance on it, but I think it's a good push. She's already proactive in the spirit of

She has a way to if I if I went out there and said hey, I've been looking for this I watched this movie last week What was that song that was playing in that movie? Okay, give it that little information check Prime Video. What was he watching? Okay, I got it I think you're watching this movie. It was this song and

And proactivity also includes, do you want me to play that song or you just want the name of it? And a lot of times Alexa will say, do you want me to play it for you? That's a subtle proactive. It's not intrusive. It's using context, contextual information, some memory, some of your history. And in the past, you've asked me to play it every time. So why don't I just ask you to play it? I think those are different forms of proactivity, but our vision includes Alexa being proactive. It has to be. That we believe the next step customers will ask for is,

I want her more, not less. Right. And so instead of me thinking, oh, I should ask Alexa, is there a point where Alexa will know to ask me? I think that's a real question. I don't think that's today. I think that is the future. And I think, you know, back to where, you know, we're pretty well positioned for that. If that's what customers want, I think we can do it for them. But I think what this listener was asking is, can I just with natural language say, yeah,

take a look at my calendar and tell me something. - Oh yeah, okay, so that's different. That's different. Sorry, I went all the way to my vision, but here's what I'll pitch back. That already happens. So when you wake up in the morning, whoever that listener is, here's the answer, yes. - With Alexa Plus or with-- - Alexa Plus. - Okay. - Sorry, not with Alexa. There's no way it's gonna happen with Alexa.

It's not. But with Alexa Plus, 100%. Wake up in the morning, get your daily brief, tell me what's going on. And Alexa knows what time you start work. We'll warn you of the traffic. You should probably leave by 8.20 if you got to be there by 9 today. Like that level of proactivity, that's in the system. But you have to engage first.

This idea of Alexa being proactive, like it is – it's definitely – I see where your caution is coming from because there are these proactive notifications that you get with Alexa. I've had to turn them off. Yeah. Yeah. We learned from that. Yeah. So, okay. That's good that there's learning there.

I could go with some other Alexa product feedback, but I feel let's use our time. Let's stick with Alexa Plus for a minute. But if you want to talk about Alexa and we can tell you if Alexa Plus has fixed your frustration, we'll do that too. Well, the one thing I'll say is I've had – I use it to play alarms and there have been moments where it will play the ad before it will play the song in the morning. But that kind of goes to a question that we did also get in the Discord where people talked about – they talked about –

Who's assistant do you trust?

And in the back of some people's head, there will be this perspective, Amazon is just going to try to sell me something. Like, for instance, that example of you didn't sleep very well. Like, all right, it's like a suggestion for sleeping pills coming up. I don't know exactly what it is. But like, how do you get past this perception of like, I'm going to get, because you do with an assistant, you trust it with a lot of data. So how do you get to the point where people are comfortable sharing this data? Yeah.

and feeling good about the fact that it won't be used to lead to purchases. I mean, first, I think even before you get to that part of the question, it's just how do you manage a customer's data? How do they see transparently what you're doing, what they've told the system? Do they have control over their data? So that's so paramount that you have to start there, actually. It's like one question earlier than that, which is,

do you trust Alexa? And the answer has to be yes. So we've been building on a foundation of transparency and control. There's the Alexa privacy dashboard, which is one great place to see everything in terms of system settings and your data, et cetera. I just want to make clear all of that carries forward to Alexa Plus. I think that's sort of the important point to make at the top. And then if the question is, you know,

Is the question, boy, should I be offered a product in a given case where a system thinks I need it? I find that great when it's great.

It is great when it's great. Like I found a pair of shoes. I don't even think it was on Amazon recently through something I was reading online. And I've got orthotics. And, you know, it's great when it's great, basically. I was referred something. They're awesome. Ultras. They have a wide toe box. I'm not going to sell Ultras on your show. I'm just telling you that I found them. Ultras. If you're listening. We need sponsors. Is this the camera? Yeah.

That's the ultra sponsor. Yeah, Pilos, give them a heads up. It's an arc. We need sponsors. Alex needs sponsors. It's an arcane example, but the bottom line is like it's great when it's great. And why is it great? It's contextual. It's relevant. It's offering me something that I actually need. And so building systems where you can do that elegantly, like customers actually love that. We get feedback that that's great. It's not what's terrible is when you get,

inundated with things that are irrelevant to you. And so we're building a system that doesn't do that.

Does Alexa need to have a screen? I mean, Ponson, for this year, you're the head of devices at Amazon. A lot of the demos that you did at your launch event were with Alexa with a screen. Again, I have like first or second generation Echos in my house. It might be time to upgrade. You should upgrade. Like there's a couple of things you're missing. One, you're missing speed that you could have that you don't have. And I think speed is time for me. It's comfort, you know, it's confidence. Like there's so much. Like first, I would always encourage...

Not just because I want to sell the next device. That's not why. Just having something modern. If your device is nine years old, you're missing eight years of tech. Okay. So I'm judging you. All right. I accept that. Giving what you do, you know? And so your feedback is like half heard at this point. But I would say... Okay. I say that, you know, jokingly, but I go, look...

You need a more, it's better. The product's just better as it, you know, generationally. Generation over generation, always got better. Does it need a screen? Yeah, it does. It doesn't have to have a screen. It's a better experience with a screen. It really is. Now, let me qualify it because you have a screen in your pocket that works with Alexa. You have a screen on your desktop that works with Alexa. Okay.

The screen in your home, you should have one. It's very powerful. It's nuanced. It's not intrusive. The new design is elegant. It's soft, if that makes sense. It's what you want in the home, something softer. You can get the expression from Alexa from that screen, and she brings visual expressions as much as anything. But here's the trick.

It will come with you in your earbuds. It'll come on your Alexa frames. It'll be in your pocket. It will be in your car. So you don't always need a screen. But in your home, I mean, the command and control, the information management, what you get off of it, it is powerful. Will it work without a screen? Absolutely. Absolutely. And it'll be great. So need is a relative term. I want you to have a screen. Okay. Because the experience is that much better.

And there's a nuance in it. Like when we start rolling out preview, the first customers to get preview will be our screen-based customers because it's the best experience. Okay. That simple. And so you'll be like, I want the preview. And I'll say, okay.

Get a screen. Get a screen. All right. Maybe two screens. And then we'll light up all your echoes, but you need a screen. Okay. Maybe one in the kitchen, one in the office. You only need one. You only need one. Yeah, well, I mean, it's up to you, but yeah. We keep the screen out the bedroom. At least that's my perspective. Totally. The only screen I allow in the bedroom is the Kindle. It's a cool product, but I'm using mine here now. Yeah, that's great. I'm just taking notes. Listening to you, by the way, I got the alarm in the morning note. I get that bug file. Like, I got you. But the idea that...

Different devices work in different places is real. Right. But I think you need a central hub right now. I think Alexa Plus is so dynamic. And the more you can learn to do, the screen will teach you like, hey, get after it. You saw Daniel's Thumbtack demo, which is a little bit even is was more agentic than if you will for us.

then the Grubhub slash, did we do Grubhub or OpenTable last night? OpenTable. OpenTable, yeah. With Uber. Right. But the Thumbtack demo was, you know, conversation, let's, I need a repair person. Well, that agent goes out and starts booking it for you on the website. And then you need the screen to give you a status, like working on it back in a bit. Don't worry about it. Okay. I think that is, that's what you want that ambience for in the background. So I think the

It can't be more clear, I don't think. I think it would be great. Okay. I'm sold. I'm going to get one. All right. We're running up on time here. I want to give you both a minute to answer this question and then we'll head out. But it's got to be a minute or your team here will have my head. We talked about how voice AI might be the future of AI or the catalyst for these large language models on the show a while back. Okay.

OpenAI, for instance, debuted or introduced this advanced form of AI with GPT-4.0. And you can see the inflection point of ChatGPT that the second they announced that, bam, it goes from 100 million to 300 million users. Is voice AI the future of artificial intelligence?

You know start and I'll close this out. I mean we've believed for a long time that voice is the most natural interface We're using it right now. We're using it with your listeners. We're using it with each other It's incredibly expressive you can load an unbelievable amount of context and power in it You can be definite you can be vague you can be nuanced so and it's just we're born with the knowledge of how to use it and it's completely intuitive so

I think we do strongly believe that it's one of the best ways to get things done. It is not the only way to get things done, but I do think it's pushing us. It's challenging us to get more and more human, more natural. And that's why it's always been one of the kind of centerpieces of our vision for Alexa. So yes, my answer is yes. And I think it's really pushing the envelope now. Okay. A minute to you, Panos. I think we're at that time where this is the inflection point. And you mentioned it yesterday, you know, the

I believe the vision for Alexa is incredibly ambitious. It centers around voice for sure. I don't think it ends at voice. I think the interaction model needs to be the one that's most natural to you, no doubt. If you need to touch the screen to complete a task, if you need to get to your computer and write the long form, I think it's a flow. And the thing you don't want to do is you don't want to block

the customer from the interaction that they need to go get something done. It's why we're on the phone. It's why we're on the PC. It's why we're in your glasses. It's why we're in your ears. And ultimately though, the anchoring point of all of it is the voice because it is natural. It's innate to all of us. The trick is getting to natural conversation. The trick is trusting that you can just talk and realize that as we talk to each other, pretty sure you can talk that way with Alexa and you're going to find that. And I think that is the transformation that's coming.

I think it finishes, you know, the next chapter, ends the first chapter and starts the next chapter and leads us to getting, so finishing is the wrong word there, but getting us to that next leap over the next 10 years. This is that starting point. The technology is enabling it right now and that inflection is happening. And it's compelling.

So it was a longer way to say, yeah, it starts with voice, but I don't think it ends with voice. It never will. Like we, it is also innate to us. You always, we as humans, we're always going to find the best and easiest path to get something done. And we think voice will lead to most of that, but not all of it. Like we don't want to overstate it. Like we...

We will find the best, easiest, which means basically the fastest path to completion, which is why you need to upgrade your devices and get a screen. Are you with me? I told you already I'm buying one. All right, well, get on it, man. We did sell at least one device here in New York. Good news. Thank you. While we're here. Our goal this week was not to sell devices, but we'll do that soon. This is very efficient and scaling.

We're killing it now. We have a new sponsor. We sold a device. Like, this is good. We're killing it. Well, look, Panos and Daniel, I want to just say while we're recording that I don't take it for granted to be speaking on record with Amazon. It's always great for me to be able to hear what you're doing and be able to ask these questions. And I'm sure for listeners, it'll be great as well. So thank you both for being here. Thanks, man. It's been a joy. Thank you so much. Yeah, it's been really great. Awesome. Well, thank you everyone for listening, and we'll see you next time on Big Technology Podcast.

How Amazon Rebuilt Alexa From The Ground Up — With Panos Panay and Daniel Rausch 56:42 Share

Big Technology Podcast

Deep Dive

Shownotes Transcript

How Amazon Rebuilt Alexa From The Ground Up — With Panos Panay and Daniel Rausch