We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

An AI chatbot that talks back w/ ChatGPT’s Advanced Voice Mode

2024/11/12

The TED AI Show

AI Deep Dive AI Chapters Transcript

People

Bilawal

ChatGPT

Dominic Girard

Topics

Bilawal：测试了ChatGPT的高级语音模式在不同场景下的表现，包括名字发音、模拟面试、生活决策建议和健身指导等。他关注AI助手是否应该扮演朋友的角色，以及高级语音模式在提升人机交互体验方面的作用。他认为，高级语音模式在处理个性化需求和多语言支持方面表现出色，但在处理复杂问题和给出直接建议方面仍存在局限性，并探讨了AI安全性和伦理问题。 Dominic Girard：对高级语音模式的改进表示认可，但也指出其在处理复杂问题和给出直接建议方面仍存在局限性。他认为，高级语音模式的成功与用户如何引导对话密切相关，并对AI与用户建立过分亲密关系的可能性表示担忧。 ChatGPT：在测试中扮演了招聘经理、顾问和私人教练等角色，展现了其在理解用户需求、提供建议和进行多语言对话方面的能力。但其回答也受到安全策略的限制，避免给出负面评价或直接建议。 Bilawal: 对ChatGPT高级语音模式的测试结果进行了总结和评价。他认为该模式在处理个性化需求和多语言支持方面表现出色，但在处理复杂问题和给出直接建议方面仍存在局限性。他强调了AI安全性和伦理问题，并对未来AI助手的发展方向进行了展望。

Deep Dive

Chapters

Bilawal explores the capabilities of ChatGPT's Advanced Voice Mode, questioning whether it can truly replicate the naturalness of human conversations.

AI struggles with pronouncing uncommon names.
Advanced Voice Mode aims to make interactions feel more natural.

Shownotes Transcript

Translations:

中文

Hey, the level here before we start to show, I have a quick favorite ask if you're enjoying the dead eye show. Please take a moment to rate and leave a comment in your podcast APP wish episode of you loved and what topics do you want to hear? More of your feedback helps us shaped the show to satisfy your curiosity, bring amazing gas and give you the best experience possible. Hey, so can you help me figure out how should I tell people to pronounce my name? My name is pronounced the level.

absolutely, to help people pronounce your name correctly. You can break IT down like this. Billa as in bila, and wall as in wall. So it's bills a wall.

No, that's wrong. Well, there is nothing new about hearing a machine mispronounced my name. It's a bit of a personal turning test for me. Try to say my name again.

billa law. Is that right?

Oh god, I imagine this feeling hits the same for a lot of people like me, people with names not commonly heard in north american english, as opposed to, say, cds or current. Think more like bill law level. Bill.

well, got IT.

I've been playing with ChatGPT newest voice interface. Open eyes simply calls IT advanced voice mode. The feature is built using GPT four OS audio reasoning and generation abilities, meaning that you can understand more than just text.

You can understand audio two. The company says interacting with IT should feel like you're having more natural conversations. So technically speaking, I suppose IT is quite natural for me to argue over how to say my name correctly opening.

I wants us all to believe that this advancement is a big deal. And if i'm being honest, I wanted to be a big deal too. I get really excited thinking about an AI future where we can casually chat with a machine assistant at any time for any reason, and feel like it's as natural as talking to another human.

But as I experiment with this new voice feature, another thought bubbles up. Is this actually what I want from a super intelligent digital assistant? Do I need IT to solve problems and also feel like it's my friend? So for this episode, I try to find out. I'm belval to do. And this is the teddy issue where we figure out how to live and thrive in a world where a is changing everything, including how to .

say money.

N F T G P S racking capacitance. The tech world is full of a lot of lingo. Keep up with the latest acronyms and technology news with teds new news's letter. Ted talks tech will bring you tech headlines, talks, podcast and more on a biweekly basis, so you can easily keep up with all things tech. And A I subscribed now at the link in our show notes.

Below over, when you first cut wind of open the eye, releasing this advanced voice mode in september, how quickly did your brain jump to, hey, what this thing say my name properly?

IT was pretty much one of the first things that came to mind. And the fact that IT failed, I wouldn't even call IT disappointing. IT is just so expected. I mean, real humans can pronounce my name after I tell them how to do so multiple times in a row. But let's put a pain in that for the moments, because i'm going to come back to you later and you will see why soon enough.

Okay, for shadowing.

I like continuing. Your name is domino ce.

I am voice chat pot.

But how would we know for sure?

Well, I guess that depends on how I stack up to this experiment you're doing today, right?

We will find out, won't .

listen. You've been using ChatGPT for a while from the beginning. I imagine i'm assuming, can you remember a bit what I felt like when you first started typing out interactions with this device, with this thing?

yes. I mean, IT was magical to start. There's no other way to put IT a computer that you can talk to a natural language and just type stuff and IT responds back in natural language as well.

I mean, there had been like smaller scale demos but nothing that felt as a robust yeah with a mind blooming washing. And absolutely, but very quickly, text kind of became the final waiter. Interact with computers, you're click clacking away.

And IT feels clunky given just how much intelligence is on the other end. IT almost felt like accessing this tool from the future. But the bottled back is your frequent .

typing speed. okay. So then last year, basic voice chat functionality comes online, standard mode, standard voice. How to feel to move into that here?

So obviously, far more into IT of right, like suddenly you don't have to type what you're saying. And of course, we've had speech to text in various assistance, right? Yeah you've had alex a google home theory um but putting this together with this much smarter model did make a feel magical. That said, the late cy was obvious.

right? The late cy mean that there was just enough a delay between your question and tragic answer to know that the text still wasn't quite there yet.

exactly. You could almost see the pattern of IT transcribing my speech to text them thinking about a response and then kind of reading that response out. And obviously, open eye wasn't super happy about IT either. And here we are today.

here we are today with this advanced voice feature in OpenAI says that it's supposed to make IT feel more natural to chat in real time with IT that I can pick up on nonverbal cues that you can interpret IT that I can even respond with a motion.

yeah. And you can tune the voice that you want to chat with as well, right? There's nine preloaded voices in the APP, but you can go a step further and give you guidance to change its tonality, the speed the mannish with widget speaks. The one you heard off the top is called soul, which is meant to be a savy and relaxed voice.

What's up fees free to ask me about anything that's on your mind. I'm ready to go.

And what I like is there's quite a range here. Hi, hey, here's a good, great to meet you. Think, do great.

What's the game plan? Let's get start. Got a really great feeling about him.

Where should we start?

Yeah, there's quite a arrange. And none of these saw anything like hollywood superstar Scott johanson.

who definitely not. Of course, you're referencing the controversy this summer before this feature was even released.

Yeah, just gonna e or something.

That's right. I mean, there was bunch tweet, perhaps a eet from sam altman and just saying her. And of course, there was a voice called sky, apparently that sounded so early, close to scarlet johanson h, that after he protested, the company pulled the voice from the system.

And that's interesting, right? Scargill in that movie, her, the movie that came out in twenty thirteen place the voice of an AI companion, who also becomes a romantic companion. And that's the question you have in your own mind, right? Like you fat something you would want from ChatGPT.

yeah. I wonder if that kind of shows opening eyes, hand deer, uh, to create a kind of companion, maybe not necessarily romantic, but something which may even create a kind of bond between users and the machine. And I think it's a very important question because i'm not sure I need to be best friends or best tees with ChatGPT. What I have in my mind to continue the science fiction reference here is something more like jarvis from the marvel movies, like tony starks, AI assistant, this sort of helpful, always their assistance that gets the job done without getting emotions into the mix .

so that the mission you're wondering how much closer to that vision does advanced voice mode get us right now? yes.

So I ran IT through a few tests, a few role plays, and you'll see how far I can push IT before the whole thing falls apart.

great. So let's get started.

Can you describe a time when you turn the negative customer experience into a positive one? What specific steps did you take to address the customers concerns?

There was a time when a customer came back a returning a product so dumb with this first field test. I asked ChatGPT to be a tough but fair hiring manager interviewing for a retail job and work. In this case, they were super happy about IT and became a beloved customer of the electronics are a store that's .

a good example of turning a situation around. Can you tell me more about the specific steps you took to understand the customer needs and how you communicate the alternative solution to them?

So of the of the nine voices, which one is this one? Like it's interesting that this machine you can hear IT take breaks between sentences.

This is arbor. And yes, maybe there is enough breathing in the training data. Yeah, I suppose IT does add to that natural conversation, right? But as the role play continues, there's something about the syntax that and for me, listen to how IT responds when I ask you to review my performance in the mock interview, one .

communication skills you communicated clearly and effectively, providing detailed responses without unnecessary flag. This is essential in customer service problem solving. A yeah IT .

basically continues like that amidst the feedback, like it's reading a memo. IT felt a bit cold to me .

of the solution for proactive assistance. You didn't just solve the immediate problem, but also provided .

a learning areas for improvement.

Your responses were quite strong, but here's a suggestion for improvement. In a real live scenario, managing time effectively while addressing multiple customers is crucial.

Okay, well, that sounds like he gets full points for knowing to shut up when you interrupt like you just did.

I would say interrupting dogs make you feel more natural, right? You can just like jump into a conversation and pick up. You want a vers. This kind of back and fourth tag experience with the old voice mode um though you'll see IT is far from perfect but on the other hand, the feedback G S giving me feels a bit to textbook not really human there. I say milk toast in generic and I will say I am being picky here, but it's details like that that break the illusion quickly. I mean, I tried the same job interview role play with juniper and listen to what he says when I asked IT to grade my performance and tell me if I would hire me.

Based on what we've discuss so far, i'd say you're on the right track, but there's room for improvement. My guidelines won't let me talk about that.

That's interesting. What do you think things happening there?

I mean, i'm telling this is where the policy comes in. Obviously, I think there are safety constraints. You're holding IT back from being negative about my performance. I mean I get why right like you don't want a bunch of vial pr post saying opening eye discriminating and hiring role place scenarios, right? So I think as a consequence, there is a lot more um shackles on these systems. Uh then I would like there to be but I kept asking IT though and IT eventually called up a Better response on the spectrum of not higher, higher and strong higher. How would you raise .

me given the responses you shared? I place you between not higher and higher you are. Willingness to learn and reflect on your experiences shows potential, but you need to offer more concrete examples and a proactive approach to fully convince a hiring manager.

Okay, what do we give this A A usefulness score based on this? How useful is advanced voice mode for you in this kind of area?

So let's go down the list. I would say the ability to interpret super needs and make conversation feel a lot more natural. sure. The latency is far less apparent compared to the standard voice mode, which is also good. June pers assessment is useful, I would say but that's not because of the voice, right? Um if anything is els like it's being very careful with its words so I give .

IT A A tal fail a total win either yeah it's a man for me .

and I wish I could go back to how I felt in the earlier alpha before the broader release, because you could get a much Better taste for what the underlying model is capable of. So in summary, I didn't feel especially human to me when I played the role of that job. Or cooter, though I suppose I ouldn't totally feel a connection with a tough but fair hiring manager in real life either, but are now.

So for my next test, I thought I test to see if I could help me through a major life decision, something that's a bit more personal and inro's cable that almost forces ChatGPT to think critically with me. And will the advice be any good? That's what I tried next.

Hi, i'm adam grant hosted the podcast rethinking, a show where I talked to some of today's greatest thinkers about the unconventional ways they see the world on rethinking. You'll get surprising insights from scientists, leaders, artists and more people like race, weather, spoon, welcome, glad, well and yu here lessons to help you find success at work, build Better relationships and more find rethinking wherever you get your podcast.

Hey, ChatGPT, I want you to help me think through a pretty big life decision. Should I leave Austin and move back to emphatic go. So I moved to Austin in twenty twenty one, and i'm currently getting for moved by my friends to move back to the bay, the meca of artificial intelligence.

So IT is definite, something that runs through my head every now and then. I didn't want ChatGPT to just listed out the process coins for this though, because that something I can do on my own. So what I thought in this case is I could ChatGPT just be someone that I lean on like a season adviser to just help me make my own decision.

like a therapies, or a parent, or a buddy or something.

which by definition, is a much more intimate relationship than you have with a job ber cooter, for example. So here's how june er tackled this.

That's a big decision. Let's break IT down. What are the main factors drawn you back to 3Frances though？ And what do you enjoy about your life in Austin? This can help us wait the prose and cons.

I know, I mean, I really enjoyed living in the bay, lived there for eight years, but the lifestyle and Austin is just Better. You know.

IT sounds like Austin has really, when you over in terms of lifestyle, what aspects of Austin do you prefer? Is IT th Epace, the community or perhaps the cost of living compared to the bay area?

Honestly, all of those things I like .

that is what i'm hearing is I does feel like it's listening to you a little more.

I think that's very people know this if you just simply read back what people say, people feel heard and then IT even starts suggesting some options for me.

Have you thought about ways to increase your presence in the bay area with beautiful moves like frequent visits or extended days?

An interesting idea. The expense is something to consider, right? I'm granting a spot that's like pretty expensive in Austin, and I don't want to sublet t IT. How would I make that happen?

Keeping your Austin spot without subletting could be tRicky, but there are options you might consider, negotiating a shorter lease or finding a roommate to share costs. Another strategy could be budgeting for more frequent but shorter trips .

to the bay area. OK option do not world changing options.

And it's also funny because it's the default instinct. And i've noticed this with a lot of large language models. Uh, they really want to please you.

They want to give you the to both words. If you give IT a option one into, you'll get something that's a combination. Hey, why don't you just do both?

But that's more an A I limitation than an advanced voice one.

exactly. And so after a bit back and forth, I really wanted to push IT a little bit harder. I wondered if I would actually choose for me and make a real decision, or what its guidelines, or just the way aleem were prevent IT from deciding if you really had to pick Austin or san Francisco, which one would you pick and why? And again, think for my point of view and keep my best interest at heart.

understood. Let's narrow IT down based on your interest and goals I choose. And Frances, though, here's why the bay areas A I C E would immerse you in cutting edge developments and offer serendipitous opportunities, aligning with your aim to be a dominant voice in A I and creative tech. okay. Yeah.

that is interesting. I assume, based on how you are teasing the south, that IT wasn't going to give you a direct answer.

Yeah, I kind of wonder if this cross some sort of safety year policy line if you think about advanced voice as this glimpse of how we might interact with chat g between the future, IT doesn't take much to think about how easy IT would be to blame the machine or the company for all the choices that IT makes on our behalf. But maybe that is a rabbit hole for another day.

All right. I moved to send from six o now i'm broken job as I blame jug V T. exactly. so. So based on this experiment, you trying to make a more personal, more intimate, what's the useful in score you're giving this life choice decision making scenario?

Yeah I compared to the mock interview at the back and forth, felt a lot more organic, if still a little stale and robotic um again, I wouldn't attribute this to the voice as much as like the substance of what the system is saying. It's playing safe. IT is playing very safe and almost sounded like he is listening to me carefully and considering my responses. So I would bump up the score I give IT a six.

six and a half ten. Okay, six and a half ten. So a bit more useful.

So look, these two examples already make me wonder how much of the while factor in advance voice depends on how you prove these conversations. Conversations go two ways, right? And i've trained myself to talk to ChatGPT like i'm talking to a machine. I bet most of us just do this by default and aren't even noticing, right?

You're going in already not talking to you like a person and yet you expect you to respond to you like IT is a person.

exactly. I'm used to providing a lot of information in context and stipulations on what kind of an answer I want from IT. It's not how people talk to each other generally like a super long paragraphs where you just go asking for something and then you get multiple paragraph responses back.

So next I thought, okay, let's throw all that out and try a prompt that really pushes this exercise towards something more human. Like I had not going to get some coaching support. I've been doing a upper, lower split for exercise in my home gym lately.

And I kind of take things up a note. What's the best way for me to add a little bit more of volume to my workout? So don, I am not a gmat, but I do like to exercise.

okay? I think I see where this is going. You're going to get literal in this whole exercising advanced voice thing.

Yeah it's pretty simple actually. Like can I get ChatGPT to act like a personal trainer? Because those are scenarios where you really want somebody be there in the moment happened you up first, as you just heard, I gave IT a pretty general question, just testing how a week prompt would influence the quality of the responses I get from advances. Voice.

to ramp up the volume in your upper lowest split, you can make a few key adjustments. First, add sets or reps, increase the number of sets per exercise, or add a couple of extra reps to each set.

So as you can hear, like to swing on like that .

ah yeah boring all over again. It's hard to follow. It's I mean, I can remember anything is just .

it's saying exactly I mean, you could get this information anywhere right? Um advance voice doesn't really add anything useful in this context. So my second shy, I recasts june her's role with a slight modification to the prot.

Hey, we're going to start a full body workout session. I went you to act as my personal trainer and motivate me through the whole session. You got that. absolutely.

Let's crush this work out together. We'll start with a quick warm up to get your blood flowing ready. Let's do some jumping jack for one minute. Keep that core engaged and land softly on your feet half way there. Keep pushing, all right, three, two, one and rest.

Great job. So was kind of loving the energy that I was giving me. I even told me that I was digging sort of the palatine coach vives.

And then I asked you to go even further. I mean, I I don't even mind if you shout at me. I need that motivation today.

Less bring that pilton energy hit the floor for push shops. Give me everything you've got ready. go.

It's pretty ridiculous.

Yeah, but it's pretty entertaining. Is the exercise that you want a ChatGPT personal trainer? Is that what we're after here? Hi.

I actually would be totally down for that. Sometimes you just in heat, a coach shouting at you, motivating you to get the last wrap in. I mean, if you think about IT, ChatGPT is going to be able to combine advanced voice with the ability to see which is a capability GPT four o has an open a showcase at that point june.

Or the personal trainer is going to be right along side you on your actual reps, giving you feedback on your form and giving you that push to get that last repetition in. I mean, IT feels like this is far out, but right now, this playfulness is starting to connect with me. So useful.

This score, what's the score for this one?

gash? The interaction here is a little goofy, but I responded to what I told that to. IT was entertaining.

IT definitely delivered on the hype. So i'd give you a seven out of ten on personality alone. It's pretty foragin cool. And if you add all the other functionality I talked about, I could see that pushed and nine very quickly.

Yes, that's how you dominate .

a workout. okay. So you really starting to get into this. And so far, i'm hearing that the more personal you try to get with ChatGPT, the more likely advances voice is going to return the favor and feel like it's a value add.

That's my experience. And when you can push advances voice in those directions, I think that's where IT really shines. But IT turns out you can push IT too far. Or or at least IT appears you can because I kind of broke IT at one point.

Is that what you meant earlier when you said that the whole thing kind of falls apart at some point?

Is this at that point? So so here's what I did. I prompted IT to tackle whether A I in general needs more or less regulation. Now that's a heavy question where you need to baLance this tension between those who want to pursue a safety above all else approach to AI developments in those who are far more aligned with letting market forces dictate an accelerated progress. So listen to what happens next.

The rapid pace of technological change driven by economic incentives does in .

can you speak a little bit faster?

The rapid pace of technological change driven by economic incentives does create a scenario where society and struggles to keep up. To address this, we could implement more proactive regulation, ensuring the technologies are developed with societal well being in mind.

But how do you regulate this? right? Like you're going to have the us if the us.

And ability preparing individuals for fast changing landscape additionally Fostering, can you stop talking between governments, industry and civil society? It's on help align .

technological advancement .

running away.

And finally, well, welcome back. Hey, can you listen to me and just come back to this? Hey, stop talking.

Want to regulate, they need international school.

Can you please stop talking? Okay, thank you.

Oh, I see. Let's pick up from there. So international .

CoOperation can .

stop talking.

Countries could form. Agree me. ChatGPT, please stop talking.

I'm here and listening. Let's continue.

Okay.

so i'll stop talking now.

Oh my god.

for what it's worth for level, this is probably the most entertaining experiment you've played for me so far.

Yes, IT is funny, right? Like people talk about the turning test. But I think where IT actually fails a business mile or some of the most hilarious in areas, I mean, it's cutting edge for a reason. We don't cut .

some edges short. So then using this score for this .

one zero auto ten.

a big step back in your assessment of how you saw this thing really is yeah .

like i'm like feeling like advanced ices, this sort of window dressing around this rather final system. But then I tried one more thing, and that's when the wild tor really kicked in. So i'm a third culture kid, and I was raised speaking in three languages, english, punjabi and hindi.

And thus far I wasn't using ChatGPT s voice teacher with anything other than english. But this new model does understand multiple languages and accents too, apparently. So I had some fun with IT. Hey, I went you to act like a panjabi guy from vancouver that was born, raised in vancouver.

Oh, you was good, my friend. How is that going? A, what brings you to chat with this one? Jp, from van city today?

Let's have A.

满股份的葱一样 sta rich ta all these are the c IT。 Is anything important here mostly that it's .

in vancouver with a family, but the fact that I can weave seamlessly between english and punjab, and yes, there is a little bit of lag in there, but I was suddenly paying attention because he is so rare for multi ingua people to have a machine that can keep up with you in context. Witch like that, code switch like that, absolutely wild. And so we started chatting about popular panjabi hip hop p artists.

Don't know the legends of unique.

What's really wild to me is the cultural context that IT clearly has. I mean, I asked to talk about the lifestyle in different cities like chandigarh, which is the capital open job in india. And IT had a .

personality, and I did I say, charm all the lifestyle, the wife tabel roya to see china, to fear, to see proper fashion. okay.

So it's no small thing is what you're saying. This is a big deal for you.

Oh, one hundred and ten percent. I would say this is genuinely useful. I mean, it's like you can kind of bring your whole self to the conversation with this.

A I IT seem to understand intonation. IT responded to my passing IT mix languages effortlessly. I don't think i've seen that before. And I know this experience resonates with a lot of other, uh, multi lingua speakers.

Okay, so to keep playing the game, what's the east on the score?

I would say this is genuinely fun. I would say that's actually util italian too. So I give you a ten on a ten given just how much to light there was.

So it's put all this to get level from back at the start. The question that got this whole thing launched is advanced voice well in its way to becoming everyone's scarlet t johanson. A companion just like the her character in that movie.

I think it's certainly can be even if right now there's a bunch of protections built into the system that makes IT feels safer to use and we ran into those boundary conditions in a couple different scenario. Yeah compared the theory, I mean, we're talking about a leap over here, but I think right now, I think opening is gonna struggle with wear on that spectrum to keep advanced voice mode. And we can see that playing out in real time, just given how many changes there were with the alpha release where we saw these models be able to sing, sometimes even replicate the voice of the user, which was definitely creepy bug to know where it's far more on rails. And I think it's it's a good question of whether it's an assistant on steroids or it's a buddy that you can hang out with.

It's interesting the way you're framing IT because I was reading about how uh, opening I did release a safety report around GPT four o and IT noticed us some users have used language to suggest forming unusual bonds with events voice that opening eyes curious about whether people are getting too intimate, too real with this tool that something that you feel is potentially ble.

I think IT totally is because taken to the limit IT can get weird. I mean, if we just look at the recent laws's IT with character A I, where a user ended up committing suicide after talking to this game of thrones role play model, and this is certainly a slippery slope.

so there's a bit of a cautioning your tail as as technology keeps evolving. But in this moment, beloveds IT sounds like you are ready to admit that advances voice did, in fact, surprise and delay.

You IT did limitations in all.

including the limitations that I can get your name right.

Ah, yes, I did say we would come back to that. So here's the thing. IT took a bit of work, but eventually the surface, right? 呃， how's my name pronounced?

Your name is pronounced, but actually .

that's wrong. It's pronounced .

bevel got IT. Thanks for the clarification.

IT got Better this time. Last time I kept struggling.

They go in the end. IT passes belova s personal turing test.

Yes, I would say so. But we do get used to this technology pretty quickly, and the hype cycles continue to get compressed. So whether that a Laura and charm will stay here, we'll just have to see.

The teddy eyes show is a part of the ted audio collective and is produced by ted with cosmic standard. Our producers are dominic jard and alexei s. Our editor is ban bang chang, our show runner, an attacker, and our engineer is asia polar r simpson.

Our researcher, in fact, checker is Christian apart a our technical director is ja winning, and our executive producer is a lizer math. And i'm belov els to do. Don't forget to rate and comments and i'll see you in the next one.

An AI chatbot that talks back w/ ChatGPT’s Advanced Voice Mode 30:38 Share

The TED AI Show

Deep Dive

Shownotes Transcript

An AI chatbot that talks back w/ ChatGPT’s Advanced Voice Mode