We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The Alexa Prize Challenge with Stanford's Abigail See and Ashwin Paranjape

2020/8/20

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Abigail See

Ashwin Paranjape

Topics

Abigail See：Alexa缺乏进行非任务导向的社交闲聊和持续多轮自然对话的能力，而Chirpy Cardinal的目标就是提升这两方面的能力。从零开始构建聊天机器人非常困难，需要从基础开始设计，并快速迭代以适应用户反馈。团队采用混合主动性设计，让用户能够主导对话，并使用优先级系统，根据用户兴趣切换话题。为了支持高用户主动性，Chirpy Cardinal尝试涵盖尽可能多的知识话题，包括冷门话题。团队尝试利用大型预训练语言模型（如GPT-2）与真实用户进行对话，这与传统的机械对话实验有很大不同。神经生成模型在对话中的应用存在局限性，容易出现常识性错误和社会性错误。为了提高神经生成模型的可靠性，团队限制了其对话轮数并精心设计了引导性问题。团队对GPT-2进行了微调，使其更擅长处理特定类型的对话，例如关于日常经验的对话。团队使用另一个微调后的GPT-2模型来处理世界知识，使其能够以更自然的方式将知识融入对话。为了避免冒犯用户，团队避免讨论有争议的话题，这限制了聊天机器人的能力，也让一些用户感到失望。聊天机器人未来可以成为传播事实信息和促进社会讨论的工具，但同时也存在被恶意利用的风险。团队未来的工作方向之一是开源Chirpy Cardinal的部分代码，并改进情感理解和回应机制，例如设计更有效的引导性问题。 Ashwin Paranjape：良好的流程和实践对团队协作至关重要，能够使团队快速适应变化并积极改进。参与Alexa Prize挑战类似于运营一家初创公司，需要快速交付最小化可行产品(MVP)。由于时间限制，团队需要快速迭代，并根据用户反馈迅速调整。传统的基于对话树的聊天机器人用户主动性低，难以应对用户话题切换。使用GPT-2处理世界知识存在局限性，例如可能出现事实错误或语法错误。预训练语言模型可能出现“幻觉”，即生成不准确的信息。用户喜欢Chirpy Cardinal能够讨论细分领域的话题，并尝试挑战系统的边界。聊天机器人系统并非处于稳定状态，用户会不断尝试挑战其能力边界。Chirpy Cardinal在用户主动性方面还有很大的提升空间，无法完全满足用户在所有话题上的需求。向用户提问过多会造成用户疲劳，因此需要平衡系统主动性和用户主动性。聊天机器人未来发展的一个方向是能够在不同层次上回答问题，并适应不同的对话情境。开放领域聊天机器人面临着“图灵测试”的挑战，需要在多个方面都达到较高的水平。对话式AI是一个全栈NLP问题，需要在多个NLP领域都达到一定水平才能构建出良好的用户体验。聊天机器人存在被用来强化信息茧房的风险，但也可能成为连接不同观点人群的桥梁。团队未来的工作方向之一是改进知识处理模块，使其能够更准确、更自然地将知识融入对话，并改进对话中的主动性机制，让用户能够更自然地主导对话。

Deep Dive

Chapters

The Stanford team discusses the difficulties of assembling and managing a large team for the Alexa Prize Challenge, emphasizing the need for rapid development and user feedback.

Shownotes Transcript

Translations:

中文

Hello and welcome to SkyNet Today's Let's Talk AI podcast, where you can hear from AI researchers about what's actually going on with AI and what's just clickbait headlines.

I'm Sharon Zhou, a third-year PhD student in the Stanford Machine Learning Group, advised by Andrew Ng. In this interview episode, you'll get to hear from a pair of PhD student researchers, also at Stanford, who took part in this year's Alexa Prize competition, and they led their team to second place.

So first we have Abigail C., or Abby, who's advised by Professor Chris Manning in the Natural Language Processing Group and is focused on improving the controllability, interpretability, and coherence of natural language generation in open-ended settings such as story generation and chit-chat dialogue. And additionally, we have their co-lead on the project, Ashwin Paranjappe, also a

part of the Natural Language Processing Group and advised by Professor Chris Manning as well. And his focus has been on making open domain social bots sound more conversational when they talk about external knowledge pieces. So thank you both so much for making the time to be on this episode. Thank you for having us. Thanks.

So a quick little bit of background. The Alexa Prize Social Bot Grand Challenge is a competition for university students dedicated to accelerating the field of conversational AI. And the competition is focused on creating a social bot, which is an Alexa skill that converses coherently and engagingly with humans on popular topics and news events.

Abby and Ashwin, do you want to give a little bit more context on the goal of this competition? Yeah. So Alexa, the virtual assistant, I think many people are familiar with and it's been in development for quite a few years now. And it can do all of these different practical skills like booking movie tickets or playing music or turning on and off your lights.

But what it's lacking is the ability, two abilities really. One is to be more social and chat to you about things that aren't really oriented on some specific task. And the other ability it lacks is to talk to you over many utterances as a real conversation over many minutes. So the Alexa challenge is about university teams trying to build chatbots that can have that kind of a conversation.

Cool. So this was the first year for Stanford to participate in this competition. And so it was a huge team effort and it obviously took a lot of time to to compose a team and to jumpstart this process.

Before we dive into, I guess, specifics of your solution, how was the process of getting a team together and getting started on this competition? Well, I think it's really hard, to be honest, because first of all, if you're starting out completely fresh, you basically have nothing to, you know, sort of try to improve. So you have to think everything from the ground up.

And then the other thing is you couldn't tell people what exactly they were going to be working on apart from a very vague idea of what their next up price is. But I think it really helped us that we focused on...

through putting in the right processes and sort of the right practices from the beginning. And I think that really paid off because as time went by, we could make changes really fast.

and not have them negatively impact, but rather just have them positively impact all the time. It was a little bit like running a startup for a year. So disclaimer, I don't actually have experience running a startup, but I think there was some parallels in that we were assembling this small team and we were trying to get really quite an ambitious thing started.

up and working in quite a short amount of time, and we were trying to deliver minimum viable products quite soon. So be able to have the bot talk about certain things at at least a basic ability level before we build the next thing. So we weren't able to just develop for many, many months and then put it out to users. We needed to actually start delivering a chat experience to users within only a few weeks.

So yeah, we had to move pretty fast and we had to react quickly to the feedback we were getting from users. And how did you call those users? Or like, how did you get the users to test your chatbot? So the Alexa prize is run on all Alexa devices. So the idea is that during the competition, and in fact, still today, if you say let's chat to an Alexa device, then you get routed to one of the random chatbots from the competition.

So the idea is that pretty much anyone who has an Alexa device in the US has been able to do that. And many, many of them have been doing it every day over the last nine months. And they've been talking to our bots. That's really cool. Do you know what the reach is for your bot? Um...

It was definitely like a few thousand conversations per day for us, right? Yeah. I mean, not all of them are super long. Sometimes they are accidental triggers, but yeah, I would say a few thousand every day. Very cool. So let's get into the chatbot itself a little bit. So the chatbot's named Chirpy Cardinal, and your paper on it is titled Neural Generation Meets Real People Towards Emotionally Engaging Mixed Initiative Conversations.

And your paper notes that, quote, our social bot engages users on their terms, prioritizing their interests, feelings and autonomy. Could you give us kind of at a high level how you implemented the bot and what you did to make it capable of doing that? Yeah. So we set out to have this kind of user first design. And one of the things we really wanted to happen was to create

allow the user to drive the conversation. So there's this concept called initiative in dialogue, and initiative is how much one party can drive the conversation. So with chatbots, because it's very difficult to be able to adequately respond well to everything, often we build chatbots to be high system initiative, meaning the system, the chatbot, mostly drives conversation, decides what we're going to talk about, and then the user is in a more passive role of just going along with that.

And that works better because you can use the initiative to stay within the realm of the things that the chatbot can talk about. But that is not the most engaging or fun experience for the user because they don't get to exercise as much autonomy over what they talk about. They don't get to choose to talk about the things that particularly interest them and so on. So one of the things we were trying to do this year was to have a more mixed initiative system. So that means that both the chatbot and the user

are able to drive the conversation and share the initiative. So this is a really tough thing to get right, and I think there's still a really long way to go. But I think in small ways, we did try to be more kind of flexible and responsive and adaptive to what the user was talking about.

So, for example, a lot of the Alexa-pies chatbots this year and in previous years had essentially kind of dialogue trees, right? Where kind of like a flow chart, you know? You would ask the user a particular starter question such as, you know, for example, who's your favorite musician? And then based on who they say, maybe you go to your knowledge base and then you find something about a song and you say, what about this song? Do you like this song? Et cetera. And you kind of hand engineer a dialogue tree like that. So these things are not very high user initiative because the bot has kind of mapped out its own path of the conversation.

So although we had some conversations like that, we tried to make it more responsive to if the user wants to switch topic. Because in the worst case, if the user tries to switch topic in the middle of one of these dialogue trees, then these systems can really fail to detect that the user is trying to talk about something else and it just keeps dragging the user down this dialogue tree.

So one thing we did is we had this kind of priority system where we have many different response generators within the bot that can talk about different topics. And they're all kind of listening to the conversation at the same time. And if any of them detects that the user is maybe more interested in talking about some other topic, then that response generator will interrupt and take over control in order to talk to the user about that topic. Oh, could you give an example? Yeah.

Yeah, so we kind of track which entities the user is talking about. So, for example, if maybe we're talking about movies and then we say...

you know, what's your favorite movie? And then the user kind of takes the initiative to take things in an unexpected direction. And they say, oh, I'm not really into movies. I'm more of a TV person. Then, you know, maybe we'd have some other response generator that's listening into that and then realizes that the new topic or entity of conversation is TV. Or maybe if the user had, you know, mentioned a particular TV series, like they say, I'm watching Doctor Who or something, then we notice our Doctor Who, that's an entity in our knowledge base. And then we would go start chatting about Doctor Who.

And in fact, this was pretty important in order to support high user initiative is we really prioritized not just talking about like the top most popular entities, you know, people and TV shows and concepts, books and so on. But we tried to cover everything. So we had essentially all of English language Wikipedia available as topics to talk about.

So the idea is that if the user happens to mention something that we have the Wikipedia article for, and if we can figure out the link between what they said in the Wikipedia article, which is definitely a difficult problem, we didn't always get that right. But if we can figure it out, such as Doctor Who, then we can switch to talking about Doctor Who later.

and use the information in the Wikipedia article to talk about the details of Doctor Who with the user. And the cool thing is that we tried to do this for not just Doctor Who, which is a very well-known show, but really minor Wikipedia articles that very few people read. So we were hoping to kind of pleasantly surprise users when they mentioned their weird, niche, little, unknown TV series that we would be able to talk about it to.

And do you think you achieved that? What kind of evaluation metrics or qualitative evaluation did you guys do? So I think it's kind of really hard to evaluate something like this qualitatively. So there are a couple of things that we were provided with. One was people rating our conversations. And then there are also some turn level annotations provided.

But on the other hand, we kind of found that even though we could detect a lot of entities and, you know, talk about them, the initiative part of it could be

indeed be restricted to this area. So for instance, if they then went ahead and asked a really niche question about their niche TV show, then we would probably not be able to answer that. And I think it all kind of boils down to a more subjective evaluation of when you see things going wrong, but it kind of is still true. MVP saw many instances of something like this. That makes sense. And Abby also mentioned, you know,

Using a huge database of essentially all Wikipedia topics. And that definitely reminds me of the GPT variants and how they've been trained. Could you touch on how your chatbot relates to GPT?

Yeah, so as you mentioned earlier, our technical report is called Neural Generation Meets Real People. And that's because we were really excited this year. One of the main things we wanted to do was to take the recent advances in large pre-trained language models such as GPT-2 and 3, although we didn't use 3 because it's so new.

we wanted to see how well we could use those to talk to real people. And there's such a huge difference between talking to real people who are just, you know, ordinary people or all the different people who use Alexa in their own homes and who have really different expectations for

what they expect to talk about. There's a huge difference between that and maybe mechanical talk experiment where you've given people very precise instructions about what you want them to do and you're able to restrict them based on certain qualifications you want them to have, for example.

So, yeah, it was kind of a very different challenge to the ones that we usually see in research because we needed to be really kind of flexible to these different directions that people would take the conversation. So, yeah, we were interested in seeing how well neural generation could tackle that. So we use neural generation in two ways. So one was...

this component that tried to talk to people about their everyday experiences. So we had all these different kind of conversation starter questions, things like, you know, what are your plans for the rest of today? Or what did you have lunch for? What are you doing this weekend? And actually, as the pandemic unfolded, we tailored them more to the pandemic scenario. So it was stuff like, what are you doing to stay active? Or, you know, how is your home life? Like you're getting on with your family, if you live with your family, etc.,

And we tried to show some empathy, you know, saying, I know that it's hard for people in the pandemic right now. How are you feeling? We asked people how they were feeling. So the goal here was that we'd invite people to share their everyday experiences and their everyday feelings. And then the goal was to kind of react to that in a warm and empathetic way.

So, yeah, we had, I'd say, some limited success with using the neural generation. One thing I found really interesting in reading the other teams' reports is that there was very far from universal success in using neural generation. Only a few teams actually used it and said that they found it useful enough to keep in the bot, and we were one of those teams. There were quite a lot of other teams that said they tried it, but it just didn't work well enough for them to keep it in the bot. So,

So I think it's still a fairly kind of fragile technology in that it can go wrong in a lot of ways. And there's a lot of kind of basic problems that go wrong, such as asking questions that don't make sense, you know, common sense errors or kind of like social sense errors, really, like not realizing that a person would, you know,

you know, not enjoy a particular experience or something like that. But I think we did have some success. But a lot of that success came down to essentially drawing the boundaries and then keeping GPT within those boundaries only. So there were certain things which it was kind of better at talking about. At least, you know, the version that we had trained, it was better talking about some things than others. So by quite carefully, like, setting up these things

starter questions to lead in a direction that was likely to lead to success and then figuring out when to detect when to end the GPT section of the conversation because we think that it's no longer going to be able to maintain coherence. That was the way in which we found limited success right now.

So we only really had GPT talk to people for bursts of around maybe three turns at a time. And I think hopefully over the next few years, we can do a lot better than that if we can get these neural methods to maintain better coherence over several turns. That's really interesting. What did you do to restrain GPT-2? Yeah, so some of it was topic-based, and I think...

With GPT, it's not just GPT as is. We fine-tuned it on different datasets to make it do particular things. So for this one I'm talking about now, we fine-tuned it on a dataset called Empathetic Dialogues, which is particularly targeted towards hearing about everyday experiences and empathizing with them.

So for this particular component, it was more successful at reacting to kind of ordinary everyday scenarios. And then if you try to talk about, for example, particular like world knowledge things, if you try to talk about a particular person or something like that, then it's not going to have the necessary knowledge, especially as we weren't using, you know, the very largest GPT-2 either for latency reasons.

So when I said constraints, it was about trying to keep it focused on the user's experience and not start talking about other world knowledge. And if it was going to talk about world knowledge, then hand off to another component. So in fact, Ashwin should tell you now about the other way we used GPT-2 to talk about world knowledge. Yeah, so we used the GPT-2 component

pre-trained model and then sort of fine-tuned it on a data set where people try to talk about world knowledge. So essentially what it means is let's say if you know about some fun fact, um,

and then you want to introduce it into the conversation with your friend, you just don't read it out from the encyclopedia, right? You sort of figure out how does this relate to this conversation and then maybe summarize a bit, take out the irrelevant details and also sort of inject it with your maybe personal opinions or something like that.

So essentially, that's what we use the language model to do, to paraphrase this encyclopedic snippet of external knowledge and then put it into the conversation.

So I think we've found a fair degree of success, but the limitations were also quite apparent. So for instance, it seemed like it was able to produce coherent text most of the times, even though sometimes it would make mistakes.

grammatical errors as well. But on the other hand, there was also this issue where if a language model has been pre-trained on a large amount of data, it has the ability or the, it wants to sort of hallucinate information into the conversation. So,

So instead of taking the name Abraham Lincoln, it would take the name Abraham and then add something else as a surname and say out the fact and then read out the fact. So I think it...

that leads to sort of factual inaccuracies, especially when these language models start to hallucinate. That's kind of the bad part. There's always a tension between how much you try to paraphrase, make it sound natural, because that's how much editing you are doing with the amount of factual correctness that you are able to maintain. Right. That makes sense. I can definitely draw parallels to the image domain as well.

So with all that being considered, we know you play second, but can you describe kind of what aspects people seem to like most or least about the system? And perhaps it could be your your own opinions as well.

Well, one thing that I think people were pleasantly surprised by is we had a module that was designed to exchange opinions with people. So we actually went to Twitter and then we gathered people's opinions on various things. So, for example, foods or days of the week or seasons or seasons.

I don't know, just everyday experiences. Stuff like, for example, working from home. There are many people tweeting about how they felt about working from home. So we made sure to collect only opinions on pretty non-controversial things like that. We didn't say opinions on, you know, public figures or...

you know, religions or anything like that that was going to cause arguments. So we collect these things and then the idea is that when we'd be chatting to people about these things, then users could tell us how they felt about it and then we would counter with nipping you in the road saying like, oh yeah, I love bananas too. They're so tasty and a great source of potassium or whatever. And one thing that people seem to actually enjoy and be rather tickled by was having a playful disagreement sometimes. So we did a study, which you can see in our paper, that showed that

Sometimes people actually found it quite stimulating in a fun way when we would disagree with them. We'd say, oh, actually, no, I don't love bananas because I don't know what the reason is to not like bananas. But people would find that fun sometimes. So I think finding opportunities to do something that people had never seen a chatbot do before, at least in their experience, that was a really fun thing to do for people. Ashwin, was there anything that surprised you? I'm trying to think. So I feel like...

Being able to talk about very niche things was also something that even though they expect us to be able to do, if we are not doing it well, then it's kind of bad. But if you are actually doing that well, then it's also another surprising thing. And one thing I realized, or I think all of us learned was that

We always imagine these systems to be in some sort of a continuum where people know exactly what they do and they are just using them day to day. But I think a big component of this whole last few years is that

These systems are continually improving and a big part of what engages people is their ability to probe and poke these systems. So we found a lot of people trying to push its boundaries, trying to ask it really tricky questions. And I kind of feel like that is one component that maybe we kind of ignore. We assume that it's a steady state, but it's not really that. So it was fun.

Yeah, I think on the topic of user initiative, as I said before, it's something we tried to improve, but we were still very, very far from really properly being able to go with the user wherever they wanted to go in the conversation. So I think that's something that sometimes led to user frustration when they'd want to talk about some topic, but then we didn't have the ability to talk about it properly. Then I think that was disappointing to people sometimes.

Oh, and sorry, another thing I read that was interesting. When we were trying to do a high user initiative and allow people to talk about all of the...

unusual interest they had. Our first attempt to do that was to ask people a lot of questions. We'd say, you know, so what are you interested in? What are your hobbies? What do you like to do? But we actually found that people got really kind of exhausted from being asked so many questions. So we realized that we had to kind of scale that back a bit and actually having, you know, mixed initiative and having the bots still take a lot of initiative to suggest topics was actually pretty important.

because otherwise you're putting too much burden on the user to make all the decisions.

That's all really interesting. That reminds me a lot about Replica, if you're familiar with that chatbot that I played with. And I recall that I admit to being a little bit of an adversarial user, trying to probe it and see how far I could get, and also feeling a bit exhausted when asked too many questions too frequently. Have you guys seen Replica?

No, I don't think I have. Oh, wow. It's an app and you can download it. It's supposed to be kind of an avatar, but I don't think the graphics side is really there yet. But the chatbot side is somewhat there. And it's really cool because people actually do use it. And a lot of people are using it for, I guess, combating loneliness right now. So how does it work?

It takes on multiple roles. I think the free version is friend, but the non-free version, I think, can be like a mentor or even significant other. And you just chat with it. It's just a chat bot and you can chat about anything. So why is it called Replica? Is it replicating something? Yeah.

Oh, I think people like to think of it as maybe yourself, like you're talking to yourself. But myself is definitely not my best friend. If I could clone myself and that was my best friend, that would not be the most healthy best friend for me. It would just be an echo chamber. Well, yeah, that sounds like we need a new replica then. Or not a replica. We need a new chirpy. Yeah.

Yeah, I kind of got confused because I searched for Replica and then there was also Her. And then I thought it was a movie. So I went on and checked out the trailer for the movie called Replicas. So I thought you were... I think it's Replica with a K. Yeah. What do you guys... I'm very curious what you think is low-hanging fruit in this area. Because I'm actually really surprised on the avenues you've pushed on, I think are amazing. And I didn't think we could...

get to some of these places, especially, you know, going niche or also just getting a sense of or being playful with the user like that. What do you think is low hanging fruit? I know you mentioned a lot of things that are, oh, maybe that's really far off. But what's low hanging fruit that you could see in chatbots across the board soon?

I think one of the low-hanging fruits is the ability to answer questions on a variety of levels. So I think right now, for instance, you can ask very generic questions to Alexa or Google and then they'll be able to answer it. But I think the next sort of step is...

if you are having a conversation, but it's about something. So let's say it's about a news event. It's about, um, your favorite musician, or if it's about, you know, a personal experience you've had being able to answer questions, um,

is one of the biggest and maybe the easier things to do going forward. Because it seems like one of the places that where we could really improve on this mixed initiative aspect is to be able to answer questions whenever the user asks them. It wasn't like the questions were super tricky or very difficult, but they needed the right sort of answers. So yeah.

I'd say that the question of what's low-hanging fruit in conversational AI, in itself, it kind of depends on who you are. So I think maybe more so than other areas of NLP, progress in conversational AI, a lot of it's happening not in a very open domain kind of way, open source kind of way, because I think there's a lot of private companies who are making

progress in this. So, you know, for example, Xiao Ice, the Chinese chatbot, seems to be very advanced. But

it takes a huge amount for some, you know, startup or group of students such as us to reach anything like that level because it just takes a huge amount of engineer hours. You know, you need lots of engineers over years to develop something as polished as that. So I think one of the things that's kind of difficult is that I think there are

like existing techniques that you can put together to build something that adds up to quite an impressive user experience, but it takes a lot of effort to get there. So I would love to see conversational AI technology becoming more open source and more shareable. So we're actually trying to work right now on making parts of Sherpy Cardinal,

Yeah.

I think the challenge really is any sort of a chatbot that has no boundaries, which is open domain, kind of leaches into sort of the Turing test, right? And then you need to have like a lot of abilities, a lot of different ways of being able to understand, empathize and do a lot of different things. And so, yeah,

I think that, and if any of these components is missing, then it becomes a gaping hole in your system. So as Abby mentioned, I think one of the biggest issues is

Just to start off, you need to have at least some level of competency in many of these areas. And only then can you start to, you know, improve on and evaluate maybe one of these areas. Right, exactly. So with, you know, if you're an LLP researcher, you can say,

All right. I've got an idea for how to improve the state of the art of, let's say, co-reference resolution, right? So you go get your co-reference resolution data set benchmark and you develop your method and then you've got your results. That's great, right? But one thing that's really hard in conversational AI, it's kind of a full stack NLP. You kind of need every part of NLP pretty much. It's kind of AI complete. You need to have every part of NLP in place in order to be able to have

you know, a good human level conversation. So obviously that's not what we're aiming for. We're not yet at the point of having a human level conversation. So sure, we've got less than, you know, cutting edge in many of those departments, but we still have to have something that works decently enough to function with these real people we're having conversations with. So yeah, there's a lot of effort that you have to put in before you can start to, you know, chip away at your cutting edge problem. Definitely. And I can,

see some risks to, uh, keeping a very closed source as well, especially maybe it's also a risk of open source. Um, what are some things, and I think you've touched on this a little bit, Abby, when you said that, oh, we wanted to avoid controversial topics. Um, and of course we all know about what happened with Tay. Uh, where do you think, um,

can go wrong with something like this and maybe have some kind of societal or social side effect? Yes. So as I mentioned, we didn't talk about controversial topics because we were not going to talk about them unless we knew that we could do it responsibly. And

And in the end, we didn't have the time we needed to figure out how to make sure that we could talk about things in a substantial way and avoid saying something offensive. So in the end, we just blocked off certain topics because we didn't want to just leave it up to chance to see what GPT-2 had to say about this extremely hot-button issue.

But yeah, as we mentioned, that was kind of disappointing to users. Users would say that they wanted to talk about something serious like Black Lives Matter. And, you know, we had something that would do a one-time response that just kind of acknowledged that it was a complex topic and that, well, Black Lives Matter. And though we hoped that they were staying safe, you know, we'd say something like that and then they'd want to go deeper, but then we would be unable to go deeper because, you know,

We didn't trust it to do a good job of that. And I think that was disappointing to people. And that applies to many, many other issues that people were trying to talk about as well that were important to them. So I think that really is a missed opportunity in conversational AI systems. And it would be really good to work on being able to address these topics better in the future. Because I think...

Chatbots could potentially be a really great tool in order to talk to people about complex topics in a way that is fact grounded, but also empathetic. So this is essentially what we wanted to do, but we didn't have the technology to do it correctly yet, so we didn't. But I think, especially now that there's a lot of problems with people reading misinformation online and essentially humans giving other humans misinformation kind of on purpose to mislead and people

people's emotions kind of spreading like wildfire on social networks and making people angrier and angrier at the other side and this increasingly divided society. Maybe chatbots could be a good opportunity for something that does not have human emotions to stick to the facts, but also in an empathetic way, make people feel listened to with a social delivery of the information. And perhaps that could be a good way to give people

an opportunity to hear factual information, but in a way that's kind of delivered to them their way. So I think, you know, that's a pretty far off goal.

And it would be great if chatbots could do that, but you also asked about the dangers and I think there's certainly a lot of dangers as well. So the same humans who use the internet to spread misinformation and deliberately anger other humans into misinformation could probably use chatbots to do the same job even more effectively. So I think that's something we should be careful of too. Yeah, and kind of related to that, I think...

We've seen proliferation of bubbles online. So it's because people sort of want to hear the opinions they agree with. And

One danger with chatbots is, since you want to appease the user, you kind of keep agreeing with whatever they're saying. And that's not necessarily the best way to approach the world. But on the other hand, one good thing that can come out of it is it can sort of connect

groups of people who otherwise would not have had the opportunity or the ability to talk with each other. So if there are two different points of view and you sort of are in a bubble, a chatbot could be a gateway to the other point of view, but in a very nuanced and empathetic way so that

you don't feel like you're being forced to listen to something that you don't agree with, but rather you're talking to a friend who happens to have a much wider worldview than all the other friends that you already have. Yeah, that's a cool point. Um,

And on that note, what are both of you thinking about in terms of future work? I know, Abby, you mentioned working on open sourcing some of the data sets that make Terpy possible. So I think actually we're more looking towards perhaps open sourcing parts of the code than any of the data. That's the main thing that we're looking at right now. Yeah.

So, yeah, I think for me, that's one of the things. One other thing that I am looking at right now is, as I mentioned, we use the neural systems to paraphrase external snippets of knowledge.

But then how can we do it better? Can we make sure that we are staying true to the facts? Can we make sure that it's not like the paraphrase does actually contain the fact? On the other hand, can we also make sure it sounds as conversational as possible? That's one of the things I'm looking at.

Another thing I'm looking at, which Abhi kind of mentioned in great depth was initiative. So how do people take initiative? How can we change our chatbot so that people feel like they can take more initiative and sort of how do we increase our zone of competence so that we don't really need to limit our ability, like limit the amount of initiative we let people take.

Yeah, so these are the kinds of things that I'm working on apart from just relaxing a bit. Yeah, I'm looking at improving the component I talked about earlier that talks to people about their everyday experiences and tries to be an active listener, you know, acknowledge, show empathy, etc. So I'm currently thinking about how can we ask better questions, right? So if you ask someone...

how was your day today? And they start telling you about something that happened. But I typically say, oh, you know, that sounds good or that sounds bad. And then you ask some follow-on question to hear more about it. So I'm trying to figure out how can we get neural generator systems, how can we control them to generate better questions and, you know, what makes a better question in this kind of scenario. That's really, really cool. Well, thank you so much, both of you, for joining us on this podcast today.

Thank you so much for having us. This was really fun. Thank you so much listeners to tuning into this episode of Skynet Today's Let's Talk AI podcast. You can find articles on similar topics to today's and subscribe to our weekly newsletter with similar ones at skynettoday.com. Subscribe to us wherever you get your podcasts and don't forget to leave us a rating if you like the show. Be sure to tune into our future episodes.

The Alexa Prize Challenge with Stanford's Abigail See and Ashwin Paranjape 37:12 Share

Last Week in AI

Deep Dive

Shownotes Transcript

The Alexa Prize Challenge with Stanford's Abigail See and Ashwin Paranjape