We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

AWS re:Invent Special: PartyRock Generative AI Apps with Mike Miller

2024/1/12

Cloud Engineering Archives - Software Engineering Daily

AI Deep Dive AI Chapters Transcript

People

Mike Miller

Topics

Mike Miller: 我毕业于麻省理工学院的电气工程与计算机科学专业，之后在多家软件初创公司工作过，最终在亚马逊工作了大约 11 年。我的职业生涯并非直线，而是在不断探索和学习新事物。在 AWS 的 AI 和 ML 部门工作期间，我喜欢不断学习新事物。亚马逊在人工智能和机器学习领域已经深耕多年，我们致力于为客户提供最佳选择和成本效益，满足他们对 AI 和 ML 的不同经验和需求。AWS 提供定制芯片 Inferentia 和 Tranium，旨在降低机器学习的成本。对于那些没有机器学习专业知识的客户，AWS 提供了 API 可调用的服务，可以将机器学习功能嵌入到他们的应用程序中。Bedrock 是基础模型即服务，旨在为客户提供最终的选择和灵活性。

Deep Dive

Shownotes Transcript

Translations:

中文

This episode of Software Engineering Daily is part of our on-site coverage of AWS reInvent 2023, which took place from November 27th through December 1st in Las Vegas. In today's interview, host Jordy Mon Company speaks with Mike Miller, who is the director of AWS AI Devices. This episode of Software Engineering Daily is hosted by Jordy Mon Companies. Check the show notes for more information on Jordy's work and where to find him. ♪

Hi, Mike. Welcome to Software Engineering Daily. Hey, Jordi. Thanks for having me. It's really great to be here. So tell us a bit about yourself because we were talking off camera or before recording about yourself joining the dark side, meaning the business side of things, but you actually did study something technical or became fluent in something technical before joining the dark side, right?

Sure. Yep, absolutely. So I got my degree in electrical engineering and computer science from MIT in Boston, traveled my way across the country, had a stint in Austin, Texas, where I worked at a couple of software startups. I mean, I ran professional services organizations, software support organizations, marketing orgs, landed out here in California, had a short stint at Google, another startup, and then landed at Amazon about 11 years ago.

And what were the domains, the products, the services, roughly, that those companies... Did you specialize? Because you have definitely specialized on something at AWS, something that we will be talking about in a minute. Did your career follow sort of like a trajectory? Or was it more or less what you desired with each one of these spaces?

I think when I talk to young people about career plans, I often talk about like your career plan is not just a straight line between point A and point B. It's like sort of meandering around a playground, like finding the stuff that's interesting to you and then spending a little time on that and then sort of exploring something else. And that was definitely my career trajectory. You know, in Austin, it was, you know, enterprise software projects.

startups in California. You know, I worked at Google for a short time, but then I worked at a company called OnLive, which did the very first cloud video games. And so I got introduced to the video game industry and sort of online video processing, you know, and then I moved to Amazon where the first five years I ran product management for Fire TV, helped launch the line of Fire TVs. And that was incredibly exciting, learned all about kind of the content industry there, but also got introduced to Amazon product management. And then about six,

six and a half years ago, moved over to AWS where I worked in their sort of AI and ML organization. So it's been a great variety. And that's something about me that I really love is learning new things all the time. And Amazon and AWS is great for that. Nice. Exactly. So you mentioned AI, right? We were going to talk about the current topic, hot topic of all the software industry and even like everywhere, like...

Now with all the drama happening in OpenAI, we're recording this on the 21st of November, a week before re:Invent. My mother called me yesterday. She's a 77-year-old woman talking about what happened at OpenAI. She knows nothing about it, but everyone's talking about this. But regardless of the gossip, it's because it's relevant. And you've been quite deep into the weeds of AI and AWS for the last six years. So add

At the moment of the AWS, OpenAI captured the headlines of most media outlets and even my own mother's attention. What was AWS doing at that point? When you're in the AI journey of AWS, did that announcement catch you? Yeah, well, I guess, Jordi, I would start by saying at Amazon, we've been doing AI and ML for

a very long time. If you think about the very earliest kind of product recommendations on the amazon.com retail website, you know, moving forward from there, you know, you've got the robots in our fulfillment centers, you've got Primaire, you've got Alexa, all of these products and things that Amazon has been working on have a very high degree of artificial intelligence and machine learning kind of wrapped into them. You know, when I joined AWS in 2017, that was the year we launched Amazon SageMaker.

as this sort of hosted end-to-end machine learning management platform, right? And so it was a recognition at that time that customers needed a more simple way to approach AI and ML and integrate that into their businesses. And over time, you know, we've seen that sort of expand into this sort of three-layer stack, whether, you know, a company has, you know, deep knowledge

machine learning expertise. They can get in at the bottom level of stack and sort of access primitives, if you will. They can use SageMaker in the middle layer as this hosted end-to-end platform. And if they don't want to touch machine learning, we've got these API callable services up at the top layer of a stack that customers can use to embed machine learning into their existing applications. So this has been happening and sort of maturing and growing for quite a long time, you know, at Amazon and AWS. But it seems like the jump in...

Quality, at least the jump in a new direction and a new horizon might be LLMs, right? Correct me if I'm wrong. You're the expert here.

Yeah, 100%, without a doubt. You know, transformer models, which form the basis of these capabilities, really started to come into their own a few years ago. And actually, I ran a product called Deep Composer, where we tried to teach people about transformer models through music and sort of generating musical accompaniments and sort of predicting the next sort of musical notes based on melodies that you would play. We launched that in 2019. But wait, was it an educational project?

Product? Yes. Okay. So if we can define LLMs, or at least the output of an LLM as the prediction of the next word that makes sense to the previous word, that would be the input. So the prompt would be the request of information and the information retrieval phase would be

The LLM guessing what word after the other, what sequence of words would make sense for the input that again is the prompt. So in the case of the product that you were describing, there was a prompt, an input of music, melody, and the output of the product was a suggestion of a continuation of notes. So how did that work?

Yep, that's exactly right. We used MIDI. So MIDI is an electronic representation of notes. So we didn't actually deal with audio files. We dealt with sort of MIDI notes. And based on MIDI notes, we made a prediction of sort of the next notes. And this product was called Deep Composer. And it was actually the third...

of a set of deep products that we built. The first was DeepLens, then DeepRacer, then DeepComposer. And these were all about giving developers hands-on access to this brand new AI and ML technology and making learning kind of fun and hands-on. Because we know that fun motivates people, right? And it's not like reading a dry book or research articles, which was kind of like that at that point in time, that was what you had to do to learn about these technologies and techniques. And so what we wanted to do was put that...

sort of into the hands of people and make it more exciting for them to see and learn. Are those products, by the way, still available? Are you sort of like reshuffling that portfolio into something? Because we will talk about the newest and brightest thing, Parting Walk, later. But what about those products? Are they still running? Yeah. Deep Lens, we just sort of took off the market. It had sort of reached its sort of useful life cycle. DeepRacer is still going strong. And in fact, we see a lot of pickup from...

large enterprise customers of AWS using DeepRacer internally at their companies to generate excitement and motivation to learn about AIML. So you've got even the world's biggest companies like JPMorgan Chase, they have an entire sort of racing league inside their company for users to sort of use DeepRacer. And what you do is you train these models in the cloud, you download them to a little car, and then the little robot car races around a track.

And so it's a really fun concept. And a lot of our customers and individuals have kind of adopted this as a way to sort of get hands-on, do some fun learning. Deep Composer is still around, but again, sort of, it was a bit early in terms of like the right sort of product fit, but it's still around. And certainly the online experience is interesting.

MIDI has a few limitations in itself, I would argue, too. I mean, not that I'm aware of that that affects the end product, but in my experience, it has a few limitations, but otherwise. So before we jump into Party Rock, which is the product that was very recently launched by AWS that brings us here, but what I really wasn't expecting when I was doing research is that although you just described a very prolonged approach and hands-on experience and delivery of

ML products at AWS, and you very succinctly summarized just now, I didn't know about AWS's approach integrated vertical, completely owning end-to-end the stack, right? From the design and deployment and use of physical machines, so AWS Silicon, to anything in between party works, which would be, in my view, like the surface level of this stack. So can you describe us what's the rationale of...

or the reasons why AWS would rather go for this sort of like complete integrated approach for AI in general? Yeah. Well, when we think about customers, right, there's going to be such a wide range of experience and needs and kind of the depth that they want to go to or that they're able to go to in terms of implementing AI and machine learning. And what we wanted to make sure

And AWS always does this, right? We think about innovating on behalf of our customers and providing the greatest sort of choice at the best cost, right, for our customers. And so thinking about that, as we started to build out our AIML offerings, we realized that

we needed something at each layer of sort of this stack of expertise and of capabilities for our customers. And I mentioned it before, but we can kind of go into a little bit more detail. So at that bottom level of the stack is for the expert sort of practitioners, the folks who want to get their hands on kind of like in a bare metal kind of sense, right? And at this layer, not only are there the EC2 machine images that have all of the sort of deep learning libraries and capabilities that you need, but we've actually gone a step

further with custom silicon. So there's Inferentia and Tranium chipsets. And these are chipsets that are designed very specifically to reduce the cost of machine learning functionality, whether it's the training part where you're processing petabytes of data into these models or

or whether it's in the inference part where you're making the predictions. So at the bottom level of a stack, we've got the silicon and sort of the machine images that expert practitioners can leverage if they want to do this work themselves. A lot of customers though, they don't want to get their hands dirty with that. They want to stay at the mid-level. And this is where Amazon SageMaker

which we released over five years ago now as this hosted end-to-end sort of machine learning platform is really great for a huge number of customers who maybe have some machine learning engineers, they have some data scientists on staff, they do want to get hands-on, they can take advantage of this rich set of hosted capabilities of SageMaker to actually build, train,

deploy, monitor the models that they've got in production. And then you've got this sort of layer on top, which is like, if you're a customer and you're building an app, but you don't have any machine learning expertise on your team, and you just want to embed machine learning capabilities like predictions or fraud detection or transcription or things like that, we have API callable services that an app developer can just call the API, feed the data, get a prediction back, and then integrate that into their apps.

So really at every level of sort of customer interest and expertise, you know, we've built up an offering for those customers. And if I take that and I sort of apply it towards this sort of foundation model space, this new sort of generative AI that really has sort of come to really matured in the last year, it's just this

crazy number of, a crazy amount of data that's really powered this, right? And that's where Bedrock sort of comes into play, which we can talk about, but Bedrock is foundation models as a service. So again, we're trying to give customers the ultimate choice and the flexibility because we don't think there's going to be one sort of core foundation model to rule them all. We think customer choices is where it's at.

I would love to delve into the architectural bits of silicon, but that's probably for another episode. In fact, bedbrog is what caught mostly my attention because what are the basic differences between the models out there? I mean, you don't need to be an expert, especially in those that are closed source. We might not know much about Claude, although AWS and Anthropic, the company that develops Claude, have broken a partnership very recently, but still maybe this information is not shared. But yeah,

What kind of different approaches can someone willing to invest in developing their own foundational model, what capabilities can they leverage at the bedrock level? And also, if you've had already experience with clients, what are the reasons? Because it feels to me like it's a huge investment in time, but it might be worthy. But I'd like to know the reasons why someone, a company, would invest in building their own foundational model with bedrock.

Yeah. So Bedrock, as we mentioned, is foundation models as a service. Customers can get access via APIs to a variety of first-party and third-party foundation models. Some of them are multimodal. Some of them are text models. Some of them are embeddings. And so each of these models can serve a different purpose just in terms of the type of model.

and then the size of the model. So we've got, for instance, Jurassic from AI21 Labs, there's like a mid-size model and an ultra-size model. And these models perform differently based on how many tokens they've been trained on, right? Or how many data points have gone into the training and what kind of training sets have gone into them. So each foundation model is going to be a little bit different in terms of what it's best at. Like

you know, multi-language, right? Maybe a model has been focused to sort of understand, you know, five or 10 different languages. And so it's better at maybe doing some translation type tasks. Maybe a model has been trained more in using conversational capabilities, like what's called reinforcement learning using human feedback in order to be more conversational. And so the actual interaction patterns with that model might actually tend to perform better in a conversational manner. So you've got a lot of different sort of

data points and sort of configurations that can be used based on what your task is. So as a customer, when I use Bedrock, I can make a choice. I can say I want to make API calls and make requests to model A or model B or model C based on the particular job to be done. What's the outcome that I'm looking for?

And that's what makes Bedrock really interesting is that depending upon the outcome that you want or the job that you have that's to be done, you can actually find a model that's very much optimized for what you want, whether it's the modality, like I want to generate images or I want to generate text or I want to generate embeddings that I can use for search or like the sizes of the model and the conversationality.

But I guess, and now I was thinking before about who would invest the time and the money to build something upon this SaaS offering of yours.

And enterprises came to mind, which are usually very wary of sharing their data. So if you're going to train these models with your own data, do they do some sort of ephemeral inference when it comes to inferencing? How does that work in terms of sort of like data sovereignty and what happens to the training data that the owner of the data offers to the model?

It's a great question. And as an enterprise customer, this should be top of mind for you is, you know, how secure is this? Do I have the ability to use my own data to improve the performance of the model? And there's actually a couple different ways to do this that we talk about with customers. There's everything from like the hardcore end, which is like retraining or

building a new foundation model, which we don't recommend unless you're like a super deep, large enterprise and you've got rich experience in machine learning. Because there's a couple of different easier ways to adapt an existing foundation model to your needs. And a couple of key ways to do this, one is called fine tuning and one is called RAG.

Fine-tuning actually allows you to enhance the model using your own data and actually do some additional training tasks to create sort of a customized version of that foundation model for you. And now Bedrock supports fine-tuning based on the particular model that you're interacting with. And what happens is if you do fine-tuning on Bedrock,

That fine-tuned model is yours. It's secure. It's only available to you because it's got your data. It's basically nobody outside your VPC can access it. So it's highly secure and that data remains yours. So that's kind of the training. And that's a little bit more expensive, requires a little bit more expertise and data, right? You've got to use data for that fine-tuning. Yeah, a lot of data, yeah.

Then there's this process called RAG, which stands for Retrieval Augmented Generation. And RAG is this way that's almost, I like to think about it as like post-processing. You're not actually touching the foundation model. You're enhancing its responses. Because what you're doing is you're, based on the prompt, you're sort of looking up data in another system. And then you're providing that data like in the prompt or as part of the context that you're

asking the foundation model so that it can sort of align its responses and give you more accurate responses based on the data that's available. And so RAG is a lot easier conceptually because you're not actually touching the foundation model. You're just using the foundation model as provided, but you're enhancing the output and making it more accurate by using your own data as part of this RAG process.

One thing that I liked about foundational models provided by AWS, which I'm not sure are part of Betrock, but probably are so, I'm talking about the one specifically running behind CodeWhisperer that is trained on public data, obviously. One thing that I really liked and I found unique to CodeWhisperer, there's maybe other companies like AWS and other products like CodeWhisperer offering it, not to my knowledge,

is that they will, whatever they offer, in this case, coach suggestions, they will link back to the coach snippet that says,

I'm hesitant to say generated the output because it's not true. The output is generated by an inference process that was triggered by a prompt. But when there's an incredible similarity between the output and potentially snippets of training data, but again, comes from public data, not from clients, they will establish some sort of like provenance, quote unquote provenance, which I find very neat, especially very neat and not only very neat, but also of good open source citizenship.

let's call it stewardship, if you wish. So what are your thoughts on CodeWhisperer, by the way? So first of all, CodeWhisperer, amazing. And you're absolutely right. There's foundation models that back it up. CodeWhisperer is awesome. From that sort of provenance kind of perspective, we will also check to make sure that there are not the code that's generated. Are there existing licenses that are associated with those code snippets and sort of surfacing that

data, which is super critical when you're building enterprise software products. I think what's interesting is you can take that model of citations or where did we get the data to justify this answer and surface that to the user. And I think that's something that a lot of the providers of applications that are built on Bedrock are doing these days because it earns trust.

with a customer, right? Especially somebody who's using Gen AI, you know, we all know there's a lot of hallucination that can be involved. You want to sort of see the provenance, like you said, I love that word, or the citations for the data that's kind of being generated and coming from. And the more that you can do that, the more you can, you know, earn trust with users and, you know, have them become more familiar with, you know, what's happening. And I'd love to sort of tie this into, I know we're going to get to Party Rock, but sort of

Earning the trust of customers and getting them to have some intuition about how these things work was one of the key reasons that we did this new software, this new product called Party Rock. Yeah, how so? Because you seem to be quite obsessed in a good way with educational...

Sort of like delivery of AI. So I find Party Rock, it's been now for how long? Two weeks? Yeah, a week. A bit more maybe? No, one week. Less than a week. So this is a playground, right? So it talks about how playful this thing is already from the get-go, which is fun and funny. The layout, the design of it is sprinkled with flashy colors and you can modify most of the...

items in the layout and so forth in the UI. So tell us about... Yeah, where it came from. Exactly. What are you guys up to when you launch this? And then we will go into what people are building with it. What is kind of a long-term vision and how does it fit into the end-to-end stack, the integrated approach that we just described a minute ago?

Absolutely. Well, let me give you the backstory. So generative AI exploded early this year. We were building Bedrock and we've already had foundation models that we've been working on for a while. And a lot of our developers across the company started to sort of wonder like, hey, how can I apply this to my job or to my particular service that I'm building at AWS? And one team started to build their own little miniature playground.

But they did it in a unique way. It wasn't just a text box with an output, right? It was thinking about it as like widgets that you could like connect together. And they started to sort of build a few demos and then make this thing available. But then they thought, hey, wouldn't it be cool if like, as I sort of built these prompts and sort of connected them, I could, you know, show them to my coworkers. And so what they did was they made the URL of these sort of applications that you were building shareable.

And this started to spread like wildfire around our company. Folks like sharing URLs. Hey, have you seen this playground thing? Like, check this out. I mean, myself, I saw it. You know, somebody sent me a Slack message. They're like, hey, have you seen this thing? And like, I started playing with it. You know, the light bulb went off. Like, hey, wait, this is actually like an awesome, fun thing.

And it gets you using prompt engineering and the foundation models and choices and sort of thinking about all of the sort of things you have to think about when you want to build an application using generative AI. And so we said, hey, look, we've done this three times already in the past.

you know, finding these like cool hands-on ways to bring this new technology to customers. And we knew that this is, you know, generative AI is a step function change in terms of the capabilities. So we've got to figure out how do we get this into the hands of more people faster. So we took that little germ of an idea and we productized it. And, you know,

You know, it's reflected in, you called it out, right? The UI is very kind of fun and playful. The name, Party Rock, right? It's an Amazon bedrock playground. You know, we wanted to make it clear that this thing is about play. It's about like low risk. You can experiment. You're not necessarily in a production environment like touching stuff, right?

This is all sort of for you, entertaining. We see people building apps, everything from recipe generators based on what ingredients are in my pantry, bike route suggestors, hey, I'm in this city or the weather is like this, where should I be biking? People have actually been building prompt engineering tutorials inside of Party Rock to teach people about prompt engineering using prompt engineering.

So it's been a really fascinating sort of road. And we're really excited about the adoption so far of people finding it fun, finding great ways to get into prompt engineering and start building that intuition for how this stuff works. To me, it also solves some kind of collective creativity problem in a way.

The last few years have seen a rise in cloud IDEs, so the ability to have your IDE, your compute instantiation running somewhere different than your laptop, and being able to share those environments, not only code using VMs and

AWS, for example, but also being able to share that with other co-workers, right? And see how the application is running or the source code of your application is doing certain things with other co-workers. But that is limited to those that are able to program, that know about development and stuff like that. And these, when you described it and what I've been playing around and what I've seen people creating seems like that for the no-code developer, for the low-code developer, right? In a way, it's like...

Many people do not know how to code and yet would like to collaborate on the creation of applications. I feel it's a bit ambitious to call what the outcome of Party Rock

is an application, but I see what you mean in any case. And I see, again, the same need to create and share applications built with PartyRock as I see a parallel with the rise of cloud IDs and shared environments among developers. Do you agree with that? Yeah, I think that's a great observation. And from the PartyRock perspective, we felt that

not just sharing, you know, what you did so others could see it, but this ability to take inspiration from others and make it super easy to build off of what somebody else did, which is how we got to introducing the remix functionality. And so there's just with a single button, you can, in engineering speaking, fork the code and basically make a copy of that code in your account and then build off of it. And we

And we wanted to make it so that you could find inspiration in what other people were doing and then learn how to tweak it in this sort of low-code, no-code way. Now, AWS has been doing low-code, no-code for a while, right? We've got SageMaker Canvas. There's other sort of really interesting products focused on building production apps using sort of low-code, no-code kind of capabilities.

And so you're absolutely right. Party rock, we're not trying to sort of displace any of those. This is very much a sort of for fun, sort of entertainment purposes kind of way. And again, I kind of go back to what we wanted to do is we wanted to kind of democratize the accessibility of generative AI to the widest number of people. And that's also where you see how we thought about

The access, right? You just need a social login to get access to Party Rock. You know, we've got a rich sort of temporary free tier of access that people can use. So we want to make it really easy for folks to get in and sort of use this thing without needing to be a software developer. So is there then a holistic global approach

approach to AI in general, AWS? Like, does this fit some sort of like vision that burner or someone has at the company that is being delivered and probably announced at the next re-invent? What is the general vision for the next six months?

to one year. Yeah. Well, we've already talked about this multiple times is how do we reach like millions and millions of users and educate them on AI and ML? And we were even thinking about this and implementing steps to do this even before the generative AI craze because we knew that

AI and machine learning is sort of this seminal moment in our culture. And for us to sort of do better for the planet, the more people and the more diverse backgrounds, the more different perspectives we have from individuals who know about machine learning and artificial intelligence, the better off we're going to be. So for instance, like last year, we launched an AI educator program that was focused on community colleges where we provided resources and training for

professors at community colleges to, you know, increase awareness and expertise around AI and ML. We launched an AI ML scholarship program, right? So yes, all about sort of democratization.

This actually prompts a question. In your own experience, but also with your clients and with the users of all these programs that AWS very generously released, what is the biggest friction in learning prompt engineering or prompting LLMs? What are the biggest problems, the most frequent ones, the most common ones that you have come across?

I mean, prompt engineering itself is fairly straightforward. I think it's in a lot of the nuance, right? So it's a bit of mix of art and science, which is a little cliche, but it's true, right? If anybody who's, you know, tried to write a prompt to generate something that they want to get out, there's a little bit of an art to it. And

I think with PartyRock, that's what we were after is providing this low friction sort of fun environment to kind of experiment and see. I think there's also some really interesting stuff with prompt engineering that you can do that's related to chaining and sort of taking the generative AI sort of text that's generated and using that as input information.

for more generation, right? And so that's where Party Rock and sort of chaining widgets and also introducing the multimodal capability, right? So you can generate an image and then you can even have that image passed as input into another widget. It allows people to sort of make these connections and sort of the light bulbs kind of go off of

oh, I see how these things can fit together. I see how the results of my prompting and maybe even the results of some deeper configurations, you know, top P or temperature impact the output from the LLM. And so we sort of gently introduce those concepts to users through Party Rock. Yeah, exactly. So,

I think it's generally true the way I've got to phrase it, that the LLMs do not have too much state, do not have a lot of memory, don't know what the previous input or inputs previous to that one were. So the ability to chain those is a good one. And also people are not so aware...

or not very aware to that the generating image, speech to text or text to speech or text to text and all those things really leveraged quite different models. So you need a really good, strong technical background. And I mean, literal background. So something running behind the scenes to connect those things in a seamless way in the multi modal way that you were describing. But yeah, can you describe in elaborate fashion, the two last features that you mentioned about temperature and topsy-turvy?

Yeah, absolutely. So these are two parameters that you can provide into LLM prompts. And they basically, the best way I think of them is they control kind of the creativity of

of the output of the model for that prompt. And I can give you a detail about like temperature, for instance. So you explained at the beginning, the way some of these large language models work is they, based on like the previous text, they guess what the next sort of tokens are, the next words or groups of words are, right? And so if you think about the probability, you know, the model has like a probability of like, let's say you got this phrase, okay, what's next? Well, we've got this probability. Here's this one's like,

30%, 20%, 10%. Now, if the model always picked the most probable next word, what would happen is you get really dry output and you might even get into loops where the model output just says the same thing over and over and over again because you keep hitting that same high percentage of output, right? And so temperature allows you to kind of add a little bit of entropy or add a little bit of randomness into like

how far down should the model go in sort of the next token selection? So it's not always picking the most probable next, but maybe one time it picks the second down, one time it picks the fifth down, one time it picks the most probable. And so you can use that temperature to kind of influence some of what looks like the creativity of the model, but it's actually just

dynamically sort of changing the choice of the next token, right? And so if you dive deeper down into that probable stack, something that's like, oh, maybe this is only 5% compared to 20%, you're going to get sort of a more interesting dynamic output. And so that's why we give that control to users so they can get some intuition about how these configurations can actually impact the output. Hopefully that explanation made sense. It does actually. And yes, you're right. It gives the LLM model...

a bit of freedom in terms of selecting the next one, not only, or it says to the model, don't follow the underlying principle that will force you to choose the highest probability word next, but actually become a bit less strict and select probably the second most probable word or the third one, as you said, and that will make the course of the next selection and the next one and the next one and the next one certainly different, at least different than the most probable part.

So yeah, that's quite fun. Yeah, yeah, yeah. I mean, again, the product is literally fresh out of the oven. A week ago it was released. But what is next for Party Rock? This looks like a wild experiment that people are having a lot of fun. Does it fall into the context of something bigger? Is it going somewhere? Again, what is the context for this?

Well, I'm certainly unable to tell you what our roadmap or product plan is, but I can certainly say, you know, at Amazon and AWS, we're always listening to our customers. You know, we love getting feedback. We love thinking about like, how can we sort of solve the next problem for them? I think it's pretty clear, like in terms of the capabilities and the functionality, there's a lot more that we can do with the product. Everything from like,

being better about sort of the sharing aspect and kind of highlighting apps that other people share. Because today we just ask people, hey, tag them with a hashtag or share them on your social network. There's no like sort of central place for users to go. So I can imagine there's, you know, some improvements around there we can do for like discovering these things. You know, there's other sort of capabilities that, you know, we can add to our widgets. We can

add more widgets just to make these things that you build more app-like, right? You can maybe have selection dropdowns or things like that. So I think those are sort of a couple of ways that we're thinking about this. And certainly we're an Amazon Bedrock playground. So as Bedrock releases new capabilities, new foundation models, enhances their offering, we will strive to roll those things in the party rock and present them in a fun sort of quirky way for users to learn about

and take advantage of. So you can imagine that as Bedrock improves, those things will show up in Party Rock eventually. Now that you mentioned Bedrock again, which is, I think, the most fascinating piece of the complete end-to-end stack. I mean, as I mentioned before, that we wouldn't delve and promise to keep that promise into the custom silicon that AWS is probably designing or

But now that you mentioned Bedrock, what is Titan LLM? Is this one of the first party models that you mentioned at the beginning? It has been developed by AWS. Can you elaborate on what Titan LLM is? Sure. Yeah. We announced Titan some time back. It's currently in preview. It is a first party foundation model. So Amazon built and trained LLM.

Foundation model. We talked about the text version. So it's an LLM that handles prompts and text output just like other LLMs. And a lot of our customers have access to this in preview. I don't know when it's going to be generally available, but it's one of the first party models that's inside of the Bedrock offering.

Do you have any details about the token, the size, the typical questions that everyone is asking? The cars, how many liters does it fit into that engine or whatever? Do we have the data for that? I don't have those numbers off the top of my head and I wouldn't want to speculate. I want to make sure we give you accurate data. So Jordi, I can follow up and you can post that for your listeners. Absolutely. I will, I will.

So then what are your expectations about re:Invent? It's taking place next week. What kind of announcements? I know you can't give us any scoop about that, but what is the nature of the feeling at AWS that things will be announced over there? What sort of direction or what direction would you like the company to follow in your own words? If you were Werner Vogels yourself, where would you take AWS?

Oh, wonderful question. I mean, re:Invent is such a special event for everybody at AWS. There's so much work that goes into it, but we love the fact that it brings all of our customers together and it's our best chance or one of our awesome chances to really sit down and talk to customers in workshops and in chalk talks and keynotes and really sort of generate this dialogue with them to sort of bounce the ideas off of, find out what they're interested, what problems are they're solving, right?

And for customers to learn from other customers, right? There's a lot of sessions where our own customers are on stage talking about, hey, I used AWS in this way and I solved these kinds of problems. So really for me, it's, I mean, yes, the announcements and the new stuff are always like super exciting and customers get pumped about that. But for me, it's about us learning from our customers and customers learning from each other as part of this giant conference. And I think that's really the most special part for me because obviously I don't know all the details about what we're announcing or sort of where we're taking it.

But there's always lots of really great stuff that's announced and really sort of spectacular innovation that's happening. And I have no doubt we're going to do the same thing again this year. Yeah, the announcements take all the headlines. That's true. But you're absolutely right. One pays for the event ticket or any other event that is similar to that because of the

Use case cross-pollination, yeah? You're there to hear from or spy on competitors using AWS, right? Or just listen from providers, listen from your own clients, because everyone's using AWS. So you'll find a whole ecosystem there. Your whole supply chain, in a way, digital supply chain, in this case, at AWS reInvent, because it's a massive event. And yeah, the cross-pollination of ideas is

I think the main reason why one would purchase a ticket for that, because it's fascinating. It's absolutely bubbling in the air. And what's great is regardless of your industry, your company size, sort of where you live on the tech stack, I mean, there's representation from everyone there, which is what's super interesting. So you always find a peer, right, that you can kind of learn from.

So then for yourself, what are you going to work in the next? So Mark Miller, director, AWS product, what is what you're working on in the next few months to a year that...

the audience would like to know. Lots of exciting stuff, primarily around generative AI and sort of related topics because, you know, just given the experience and sort of where things are happening. And my role is really to help our individual teams just innovate faster and think about sort of cross-pollinating across the company. How do we sort of connect people and connect ideas and think about what can we do with generative AI to innovate on behalf of our customers? And that's really where my focus is.

Do you see any particular vertical pulling from the Gen AI capabilities that AWS is offering right now, especially strongly? Is there any particular vertical that you're surprised by that is using these capabilities in an incremental fashion that it's like, I didn't expect this from the financial sector?

For example, I think we'll have an opportunity to see some of that at reInvent. I think to be honest with you, like what's surprising is the breadth. Like it's actually like everyone is thinking about it and sort of wondering how it can be used to improve their business. And, you know, I think that's where sort of Bedrock and these offerings really come into play because it's a

cloud-based sort of foundation models as a service offering that you can fine tune or use RAG to sort of customize. And it's not one model to rule them all, right? We've got a wide variety of models from first party and third party providers that do different things, right? We've got multimodal, we've got text, we've got embeddings, like all of those things are going to be represented in Bedrock, which is going to be super interesting for our customers. Well, Mike, thanks so much for joining us today.

Thank you.

AWS re:Invent Special: PartyRock Generative AI Apps with Mike Miller 40:27 Share

Cloud Engineering Archives - Software Engineering Daily

Deep Dive

Shownotes Transcript

AWS re:Invent Special: PartyRock Generative AI Apps with Mike Miller