We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode The 50-year quest for AI virtual assistants

The 50-year quest for AI virtual assistants

2023/9/25
logo of podcast The Vergecast

The Vergecast

AI Chapters Transcript
Chapters
This chapter explores the history of virtual assistants, starting from the 1970s with the initial concept of a conversational user interface at Xerox PARC, to the present day with advancements in large language models. It highlights the evolution of the technology and the enduring appeal of the virtual assistant concept.
  • Initial concept of conversational user interface at Xerox PARC in the 1970s
  • Development of GUS (Genial Understander System)
  • Influence of Steve Jobs and Xerox PARC on the development of the graphical user interface
  • Recent resurgence of interest in conversational user interfaces with the advent of large language models like ChatGPT

Shownotes Transcript

Translations:
中文

Support for the show comes from crucible moments, a podcast from the koa capital. We've all had turning points in our lives where the decisions we make end up having lasting consequences. No one knows this Better than the founders of some of today's most influential increase bal moments.

Lets listen ers in on the maker break events that defined major companies like dropbox, youtube, Robin hood and more told by the founders themselves. Tune in the season two of crucial moments. Today, you can listen at crucial moments, stop com, or really listen to podcasts.

Welcome to the verge cast the flagship podcast of general search query ies, my friend David peers, and this is the third and final episode in our series about ai. And for this one, I wants to talk about one of the oldest stories in A I, and really one of the oldest stories in the tech industry as a whole, the virtual digital assistant. My favorite story about this is one I learned a few years ago, and i've been thinking about IT ever since.

So go back to one thousand nine hundred and seventy nine in palo alto, california. This guy named Steve jobs, you may heard of him, goes to xiaoli park, which was this ultra successful R. N.

D. Lab owned at the time by xiao. People at park helped create everything from laser printers to ether net to personal computers. IT was this incredible factory of cool technology.

But the thing park is probably best known for is what Steve jobs saw, the graphical user interface, which went beyond the text based, command based computers everybody was using, and actually used images, menu use and space, like three dimensional space, to help people navigate their computers. And as the legend goes, jobs saw all that, took IT back to apple, made the macintosh, took over the world. The stories is actually slightly more complicated than that.

The macintosh had been in development since about a year earlier. And IT seems like what the park visit may have really done is help convinced jobs that IT was a good idea worth pursuing and building, rather than, you know, showing to him for the first time. But that is not what I want to talk about.

The story I want to talk about was happening in a different lab at park, where a bunch of employees were teaching a computer to understand and communicate in natural language. They had this idea for what they called the conversational user interface, where, instead of typing commands, essentially like writing code into your computer, you could work with your computer by just asking questions and typing like a person. Remember, this all sounds very Normal.

Now, this is almost fifty years ago, and actually forget typing as a whole. They thought, maybe eventually you'd just talk to your computer out loud like a person. The teammate park, in general, thought that a computer should act more like an assistant than a machine.

You'd say, book me a flight to two son for next weekend. And IT wouldn't launch a program to do that. I would just do IT for you. They actually built this thing called gus, the genial understander system, to show off what they were talking.

And in a paper in one nine hundred and seventy seven, again, forty six years ago, the researchers included a sample dialogue of how this interface, i'm going to have two A I voices read IT right now. Just see you get the vibe of how IT might work. IT starts with us, the A I system, introducing a ough.

Hello, my name is guss. I can help you plan a simple trip by air. Where do you want to go? I want to go to send D A, go on mate eight, what time do you want to leave?

I must be in send d ago before ten in the morning. What do you like? PSA flight one hundred two, that arrives at nine fifteen.

A M, what is the next flight? Air california. Flight three hundred ten, that leaves at eight thirty am and arrives at ten am. I'll take the first one.

In what name should I make a reservation please? Dan bao, I have confirmed the following flight PSA flight one hundred two on wednesday twenty eight from cenote to 3d ago。 That leaves at seven thirty am and arrives at nine fifteen am.

What day do you want to return on on friday in the evening? Would you like the flight that leaves at seven forty five pm? That's fine.

I have confirmed the following flight PSA flight three hundred seven on friday, may thirty, from Sandy ago to sao, say that leaves at seven forty five and arrives at nine thirty P. M. Thank you for calling goodbye.

I'm pretty sure the voices there wouldn't have sounded nearly that good in one thousand nine seventy seven. But you get the idea, obviously, the graphics user interface won out at the time, and one for a long time. That thing you just heard is not how we book plain tickets.

And IT hasn't been for the last four decades, but people did keep working on this stuff. The idea of a virtual digital assistant that can just do things for you, and all you have to do is talk to IT. Like you would talk to a person has been so seductive to so many people for so long.

And in recent years, IT actually feels like we're kind of wearing back toward the conversational user in interface as the future of computing. I mean, ChatGPT is nothing if not a conversational user interface. It's like the whole of the internet in a chat window being is becoming the same thing google has barred, which is, you know, the same deal.

And even with things like aleta and siri and google assistant, we've been heading down this road for almost a decade. But IT actually feels like it's really exploding right now. And this push work on raises some really interesting questions.

How far does the chat window go? Is the future of computing really just like messaging? And are we actually, with this advent of modern large language models and all of the underlying tech that makes A I smarter and more useful, getting close to the kind of virtual assistance we've been dreaming about for basically half a century? Tell me, figure out the answers. I called up Richard social.

Richard, so I am the C. E. O. You come.

yoda. Com is actually one of the most ambitious A I assistant platforms i've ever tried. It's partly on A I arch engine so I can answer your questions or help you find information on the web.

That's really where IT started. It's also a general A I tool so you can use IT to right code or create images or come up with brilliant emails that just town like you. And long term, if Richard has IT right, udt com will also become something like the smart phone, a platform on which you can do almost anything all through this one interface.

Richard has spent in the tech industry for a long time, and he's been working on problems of natural language processing for even longer. He actually did A P, H. D at stanford on bringing neural networks, which is kind of an A I system, two natural language processing, which was at that time a huge leap forward in rethinking how computers process speech pain is he knows the history of all this.

He knows how IT works. He knows how hard he has been to pull this off. And he's told me that over the last decade, as he saw this tech get Better, he thought this might the moment to do IT.

What he wanted to build was a chat bott, kind of, and an assistant, sure. But he looked at all his research on language and conversation and natural language processing and how we interact with computers. And he decided the thing he really needed to do was built a different kind of search engine.

The biggest application of natural language understanding is in search. That's where a billions of people learn something, try to find information, try to get things done, and they start all of these task, they on a search engine.

And so do I want to necessarily call that a search engine? IT almost doesn't matter what you call IT, but IT is that sort of magical spot in the beginning of many people's online journey where you sit to be able to be helpful for those users. And so that sort of let me what let me to search that like .

distinction of what you call IT, I actually think is like part of the reason i'm so fascinated by you that because I think you're absolutely I mean, you're obviously right. That search is kind of the dominant interface for this kind of natural language processing in the idea of just like how people search for information.

But simultaneously, even as you are starting to think about how to apply neural networks to this stuff, like obviously you were thinking about this stuff before ChatGPT was around showing, you know, new chat body ideas about how the socket work. But you were you were playing with this technology for a long time. And I I can imagine a world that is not that different from this world where you went much more down the lake, syria.

Lex, uh, google assistant, right? Rather than kind of building something that looks like a search engine even to accomplish relatively the same goal, right? Am I thinking about that the wrong way? Like are those two things kind of further apart in your they are in mind?

You couldn't be more spot on. And h, it's actually very timely too, because we dropped the word search engine .

from the interesting OK.

Now just like make IT very clear that this is going to be a conversation and that were an AI chat, the system that is going to help you learn and get more things done and help you summarize and all of these things. And so that's exactly how where we actually reposition ourselves.

After many years, we've learned that really search engines do not inspire excitement people, but an AI chat assistant that helps you with finding information and answers and learn. And so one does excite people. They're ultimately the same thing.

That is the marketing question. We still want to be able to say this can be the default in your browser. You know, you can start every question you have in your browser and your U, L, nv bar IT defaults to you don't come because that's where you get the most useful first step.

In many different intents you have, but we are moving away from the terms church engine. And to be honest, and I probably shouldn't say this, but we don't have to finalize yet what the new terminology should be. The place we want to be is be the most helpful assistant that you can be, which honesty is also something that google has worked on for many years as well.

Part of what seems like has changed since he started working at u. Com is that I think wonder that you did that was very smart at the beginning, was make IT look like a search engine just because I think that's like a thing people understand that's right. What to do with right? Like even in the way that I think a lot of people open ChatGPT now.

And I like what what do I do with this? People instinctively understand what to do when they open a web page that looks like google. So building a web page that kind of looks like google makes a lot of sense even if what you're doing on the other side of that ends up being very different. Do you think that's changing that you can make something that looks less and less like a google query page and start to bring people along with you and all that other stuff?

Now yeah that sounds like you in our meetings that's that's a very stute, very sute observation to here. Something we we exactly found that last year we had in the summer the first l ms within a search engine context. And we try to innovate IT over the last two and a half years along a lot of different dimensions of what could in the past we call the search engine.

We tried and you know, we had horizon onal scroll, where you just see a few sources and then you can always dive deeper into IT very quickly. And we had, you know, different apps that are within that search engine context. And what we would often hear from users until late last year was that interesting.

But i'm just so used to google, I wouldn't want to switch to anything that's too different. And IT, we kept getting pulled back by users in our journey to something that is more and more similar to google. And IT was quite frustrating because, no, we had some good ideas.

We thought and know in each of these ideas, we had a core, a heart core group of of power users who did love IT. But as this wasn't growing enough and the majority of feedback we kept getting, as it's two different to google, different to google, I want something like that more similar, but just a little bit Better here and there. And so that all changed, thanks to ChatGPT.

All the sudden, a lot of users, lot of them early adopters, but more and more just Normal folks too, basically could conceive a new world of finding and learning, uh, about things in the world. So that is kind of the biggest advantage that we've had since chat. V, T, I came out and, you know, two and a alf. Weeks later, we launched youth chat, which basically was the first l that is connected to the internet and hence can be up to date, factual and half stations for the fact that he tells you making much more.

Okay, we need to take a break, and then we're going to come back and talk more with Richard.

Hey, it's lee from decoder with the pita. We spend a lot of time talking about some of the most important people in taking business about what they're putting resources to and why do they think it's so critical for the future. That's why we're doing this special series diving into summer, the most unique ways companies are spending money today.

For instance, what does that mean to start buying and using A I at work? How much is that costing companies? What products are they buy? And most importantly, what are they doing with IT and of course, podcasts? Yes, the thing you're listening to right now, well, it's increasingly being produced directly by companies like venture capital firms, investment funds and a new crop of creators who one day want to be investors themselves.

And what is actually going on with these acquisitions this year, especially in A I space, why are so many big players in tech deciding not to acquire and instead license tech can hire away cofounded ers? The answer, IT turns out, is a lot more complicated than that. Seems you'll hear all that and more this month i'm decoder with me life presented by strike. You can listen to the coder whether you get your podcast.

Welcome back, lake. We've been talking about the uptake in A I A natural language processing and voice assistance has been happening kind of slowly and then all at once over the last decade or so. And even though series still kind of sucks, sometimes the underlying tech really has come an incredibly long way in that time.

But I think if you wanted to pick the moment the modern era of conversational user interfaces really started, you would have to be the launch of ChatGPT last fall, something about this system that felt like I understood all your questions and responded with human and helpful answers felt like magic to a lot of people in a way that even alexa and google assistant just never did. ChatGPT isn't perfect via sly. It's a million away from perfect.

But IT seem to make people think that IT might be someday and that there might be something revolutionary in that this could be the way we access information, not by going to google that calm or scrolling stuff on your phone, but just by asking for IT that felt huge. But now it's a year later, and that idea has taken a bit of a hit. For one thing, we've all discovered the many, many, many imperfections of the tech and also the ways in which conversations maybe aren't all that efficient.

I think we're definitely in the low part of the chat, but hype cycle IT don't get Better, but it's not what IT once was. Richard, though, started you that com before the ChatGPT explosion. And so he's seen everything rise and fall and rise and fall and all the parts of the hype cycle here.

And he seemed people poke at the edges of the tech and really try to figure out where IT breaks. So what if we learned? What can a virtual assistant do? And if and when the tech really does work, how do we want them to work? Richard says we're still learning. He's still learning, but he has some pretty .

big ideas about IT like we I love tax rate. I love natural language processing, but tex isn't always the best way to communicate something. Multimodal answers are the best way.

So if you ask, uh, you chat, what's the stock Price of pantier salesforce with some company? We're not gonna give you a bunch of numbers and especially not those that are illustrated by ban L N, but we show you a stock ticker that's interactive. If you can move your mouths over, you can see the daily trends, the weekly trends and early trends, the market cap and all of these different numbers.

And you just see IT directly within the chat window, right? And then you can ask, like, well, now who's their CEO? And IT still captures all the context and remembers which company you talked about before.

And then, you know, gives you that company CEO involve just did in a retrieval back and who's their CEO IT wasn't give you anything useful. But you know we learn to transform those queries as one of many things that that know going on the background that you have to get right in a lot of different ways. And very easy and obvious when you get a wrong, easy this takes to make.

And so we believe the answer, uh, in the future is going to multi model a. And that's why we invented also last year, this APP store, where you can essentially other people can also submit apps to you out com, and then they can show up within the search results and within the chat response. And I think that will ultimately be be the best interface similar to know.

Give me some good restaurants around me. I still remember when speed tracking condition started to finally work and I started now investing also after my first companies acquisition, I had a bunch of folks say, oh, we're onna use speech recognition and uh to do all of these different things and it's going to be a personal assistant and OK will give me example. And one of the founding was like, well, the sample is like you ask for a restaurant and like, so you really say, I will have this, like speech system and I talk to IT and I asked, like, what are some good restaurants? And the response is there are one hundred fifty responds around you.

Number one has four point three stars. Their favorite dish is pattie and this point three miles away from. And then like ten hours later, you didn't have a sense of where to go. Like IT doesn't make any sense. And so, you know, I love A I I love natural language processing, but you have to have some user empathy and try to build, uh, build IT in a way that meets people where they are and and not be too much like this is the new hammer. Everything is an now yeah I mean.

it's kind of amazing how many systems work like that radio, all the restaurants in neighbor od thing like that's where we are with so many things right now.

I guess the absurd thing was gone to be my follow question, which is like how much of this you feel obligated to build yourself verses? How much can you kind of create the ecosystem for? And I guess they do still wonder that especially as that grows and the capabilities grow, more things to do, how much of that space is kind of you outcomes job versus being, you know, the platform on which all of the other of us maybe should hire.

you know, product team. That that is the big uh, the billion trillion dollar question. I think OK long term is very clear to me that IT cannot be all on us.

And in a system that ultimately lets you accomplish anything and compresses the entire economy. But is also clear that there's a cold start problem here. People don't want to build an ad on a platform that doesn't have hundreds of millions of users.

Hundreds of millions of uses aren't going to come to your platform if you don't already provide enough functionality to fulfill most of their needs in a really, really good way and some needs Better than anyone else, ever or anywhere else. And so that's the old start problem. So in the beginning, IT is mostly on us. And then I hope over time and transitions and we'll get more and more high quality apps into the ecosystem. And I think double ultimately be a more fair way of structure.

That first page of the internet that currently know google benefits from everyone else on the internet, but then they don't benefit back, right that much other than, of course, no paying for traffic and having to pay for even their own ads on their own name just so that there's not a competitive outbid them on their own name. And then they show up on top and like all of these things. So I think it's a Better way in a fair way to have ultimately a chat portal again. But I think portal les also have a bad marketing connotation. And so we might not call that yeah.

I always thought the one of the things you guys did really well from the very beginning was kind of let people just tell you the sources they like Better. I've never understood why every search engine doesn't just let me say I really like these four news websites, and i'm not that are interested in seeing other news swee sites.

IT seems like a straight for the thing, and I feel like the APP store model really just build on that in a lot of way is right? You get to cure IT. Like here is the stuff I want to see in the way I want to see IT and others can tap into that. And that's a pretty powerful kind of feedback.

but I think is very cool. Yeah in the in the pre chat, this uh, red IT was one of our favorite apps of our users.

How do you think about personalization? I mean, that seems to me that the perfect version of this, you know, years down the road, or however long IT takes, is an assistant that effectively knows everything about me, right, and knows where I am and what I like and what i'm interested in on and on and on. And obviously, that has big privacy questions, which I want to talk about.

But also just like logistically is very hard to do. And I am just thinking about like search is one thing where it's just kind of up drinking and down drinking web pages. But like if you're trying to help me do stuff, getting information from me seems really important. But also like the kind of thing you don't want to too much of our are gonna lose people like how do how do you think about that?

That's right. Yeah, but something we have been working on for for many months. We have to know when to do this and when it's not that useful if you just want to have know like when was very obama born IT doesn't really make that much sense to be like, well, maybe he like mountain biking too sometimes you know, it's like, guess not what I asked why I just asked your simple question.

I want to get a simple fact. And so to not overly personalize, but to no went to do IT well and and went to reuse the facts, and then also do IT make IT seamless for the user. So IT just feels like, oh yeah, like, the more I use this, the Better IT gets for me without being too creepy either. And and still having control over their private information in a way that, you know, makes them feel safe is very important.

So what what are you seeing as as folks are really starting to embrace the kind of chat assistant piece of this? Obviously, the like help me write an email, help me with my code thing seem to be too sort of killer apps for people. Is there other stuff you're seeing that's jumping out of you as nason use cases here?

Yeah tone of things. I think students are going to have a really a great time when you don't come. We just unchpin to school campaign to it's now the cheapest way to access GPT four plus, have access to the internet for that chat board. And on that front, too soon we will have a whole new class of questions that are very complex that no other Normal L M chat pod can currently solve accurately but often pretends to be able to solve, and then does h make a lot of mistakes, which I think is is very problematic, is just a whole new class of of problems that can be answered with an an eye that has just more more super powers than just an a large language model.

But yeah, for now, I think I still see a lot of potential for parents know for for folks who want to find recipe, people who travel planning and get like nice ideas for is that are going to be much more personalized also in the in coming weeks and you, tom and and students getting answers for the questions. I'm excited also to incorporate more and more general A I capabilities into. These know if you ask, like, how can I generate image with ei, then we just let you do that right then. And there, if you are you pro, you can do IT as often as you want and say, and lots of, lots of interesting use cases, I think the more a where we are still iterating on is like when to make IT IT just as good as google for the quick navigational stuff and the quick information stuff. And then when can we really shine by being much more chat first.

right? And you do kind of have to do both, right? Even though the navigational stuff is ultimately not that interesting that the like adds a big part of the sort of habit building of IT, right? And and if you really want to be the kind of home page of the internet for people, you have to do that as well or Better than google.

That's exactly right, you know. And there are a lot of like check parts out there that some that I think are a little bit you know they have their height moments, but because they are just not connected to the internet that not that factual up to day they give you sometimes they even give you sitting but don't actually when you click on the city, you don't see the fact that they're saying, you know that was the for it's not at all on that page and and that that loses a lot of trust after a while. And so you know there's a lot of like sort of brief hype chats, but I think to have a really long term shot, you need to have the utility that is more similar a surge engine, not just be like a dictionary look up kind of thing that you use once .

a week yeah totally as I think the idea of an AI chat bott is like a complimental thing is kind of interesting. But IT is like a primary interface to the internet, is where this stuff starts to get really interesting. And I think I think that's coming, but it's displacing google is harder than most people give IT credit for is the thing I have learned over and over and over in the time been in this world.

That's exactly.

So last thing, talk to me about privacy, just permanent. I know it's it's a thing you guys have talked a lot about. I know the kind of all of this stuff were talking about utility and a personalization.

All this stuff often kind of runs directly against privacy. Do you have a strategy for kind of how to have your cake and needed to hear? Like how do you how do you think you can do right by all those things at the same time?

Yeah, great question. We we used to talk a lot about privacy and we still care about IT, but we don't talk about IT as much because, to be honest, that doesn't excite enough people. You just can grow quickly enough.

Consumer business by just talking a lot about privacy are few folks they can and they know doctor goal obviously is like amazing on that front. But we just didn't see the growth from having A A strong focus on IT. And IT does in many, many cases, slow you down like people want to have no bugs and they want to know that things work.

If you were to never, ever do any, any analytics, you don't know how happy you are making your users right. And so in many cases, we didn't use you know, we don't use google analytics and so on to have the best privacy of any of these chat engines. And when you look at pretty much all our chat competitors, all of them are selling your chat transcripts and your data to others, maybe not always like specifically, but like like attach to your IP or something like that, but they're all like using google analytic and in all of that.

And so we're trying our best to be the most private of those engines. But to be onest, we've also when you talk about IT a lot, they're just some people who will never be happy until you're an open source like non profit on the tour network that like makes no no revenue like and has no ads or anything like. And that's just not a viable, viable business strategy either. And so so ah we care about IT, we think for the best private chat engine chat system out there, but we don't we just don't talk about IT as much as as other features that people seem to care about more. And that helped us grow too many more millions of users than before.

A little while after Richard and I talked you, that com rolled out a feature that he hinted that a couple of times in our interview. It's called you agents and is able to do everything from calculations to code execution for you. One of the examples you that come gives is calculating your mortgage feed, all the data you care about, the amount of one, the time and the agent will run the numbers and speed of payment back to you.

This is a fairly beni an example, right? There are a million ways to do mortgage calculations on the internet also says IT plans to support things like file uploads and graphical outputs, and much more so. The idea, I think, is that you could upload all of your mortgage docks and then essentially have the agent explain them to you and do things with them and even do things on your behalf with them.

That is some seriously cool viral assistant stuff. I know i've said this already, but it's important to keep saying the tech for this all still is not finished. The underlying models are getting Better, but they're wrong.

Too often they make things up, too often they don't show they're work often enough, and they also just sometimes don't understand what you mean at all. Plus, I really increasingly don't buy the vision that chat is the future of everything, no matter what OpenAI or being or whoever else would tell you. I don't think typing into a text box is the future of the universe.

But i'm more convinced than ever that the super powered virtual assistance we've been dreaming about and trying to build for more than half a century might finally be here soon. And it's going to be awesome. right?

That's IT for the show today and that's IT for a in series. We have had so much fund making the series, I really hope you've enjoyed listening to IT as well. We've also got a bunch more fun special stuff coming up this fall.

And don't worry, we will be back on wednesday and friday with your regularly ly scheduled verge cast programing until then. A, I, David, you get one more run of the credits you do. Ga, okay.

That's IT for the verge cast today. Thanks to everyone on the show today, and thank you for listening. There's a whole lot more stuff from this conversation at the verge dot com. We put some links in the shower notes, but also read the verge dot com. It's a good website.

We like IT if you have thoughts, questions, feelings or other ideas about how to book plain tickets, you can always email us at verge, cast at the verge dot come or keep calling the hot line eight sixty six verge one one. Send us all your thoughts and questions and ideas for what we should do on the show. This show is produced by Andrew marino and liam James brook matters is our editorial director of audio. The verge cast is a verge production and part of the vox media podcast network will be back with your regularly scheduled verge casts on wednesday and friday to talk about DVD, VR and all the other news of the week. See you then rock and roll.

Support for this episode de comes from A W S. A W S, generate A, A, I gives you the tools to power your business forward with the security and speed of the world's most experienced club.