We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode GPT-Neo, Wav2Vec-U, Deepfake Dubs, Michelangelo AI, History of Ethical AI at Google

GPT-Neo, Wav2Vec-U, Deepfake Dubs, Michelangelo AI, History of Ethical AI at Google

2021/5/28
logo of podcast Last Week in AI

Last Week in AI

AI Deep Dive AI Chapters Transcript
People
A
Andrey Krennikov
D
Daniel Bashir
Topics
Daniel Bashir: 本周AI新闻主要关注自动驾驶技术、大型语言模型GPT-Neo以及Google的AI进展。苹果公司增加了自动驾驶测试车辆数量,减少了驾驶员数量,并申请了一项夜视系统专利。Waymo的4级自动驾驶出租车在右转时遇到障碍物,无法处理,最终需要人工干预。Eleuther AI发布了开源的GPT-like语言模型GPT-Neo,提供了一个免费的GPT-3替代方案。Google在I/O开发者大会上发布了Lambda和MUM两种AI模型,旨在提升自然语言处理和搜索能力。 Andrey Krennikov: GPT-Neo是一个开源的、类似GPT-3的大型语言模型,为研究者和开发者提供了一个免费的替代方案。其性能虽然不如GPT-3,但在某些任务上表现出色。Facebook的Wav2Vec-U模型能够在无监督的情况下进行语音识别,这将有助于利用更多未标注的语音数据进行训练。Flawless公司开发的AI技术可以进行高质量的深度伪造配音,在不损失演员原始表演的情况下实现电影和电视节目的翻译。佛罗伦萨大教堂博物馆使用了Coelho公司开发的Michelangelo AI聊天机器人,以增强博物馆的在线互动体验。 Sharon Zhou: 关于谷歌道德AI团队历史的文章,强调了Margaret Mitchell和Timnit Gebru对团队的贡献,以及她们被解雇事件的负面影响。Twitter对图像裁剪算法的偏见进行了研究,并根据研究结果改进算法,允许用户自定义裁剪区域,这是一个AI伦理方面积极的案例。

Deep Dive

Chapters
This chapter covers recent advancements in AI research and applications, including GPT-Neo, an open-source alternative to GPT-3, and Facebook's unsupervised speech recognition model, Wav2Vec-U.

Shownotes Transcript

Translations:
中文

Hello and welcome to Skynet Today's Let's Talk AI podcast, where you can hear from AI researchers about what's actually going on with AI and what's just clickbait headlines. This is our latest Last Week in AI episode in which you get a quick digest of last week's AI news, as well as a bit of discussion between two researchers about what we think about this news.

To start things off, we'll hand it off to Daniel Bashir to summarize what happened in AI last week. And we'll be back in just a few minutes to dive deeper into these stories and give our take. Hello, this is Daniel Bashir here with a weekly news summary. This week, we'll look at two stories on autonomous vehicles, a GPT-3 alternative, and Google search. We haven't heard many Apple car rumors over the past few months, but a recent story says the company has increased the number of self-driving cars and halved the number of drivers licensed to drive those cars.

As 9to5Mac reports, Apple now has 68 self-driving test cars and 76 drivers. In March, Apple has granted a patent for a night vision system that combines visible light, near-infrared, and long-wave infrared sensors that would allow the Apple Car system to see three times farther at night than a human driver. Apple is reported to have discussed possible partnerships with a number of established car makers including Hyundai, Nissan, and BMW.

Our next story from the Robot Report concerns a particular ride YouTuber JJ Ricks, the most prolific documenter of Waymo 1 level 4 robot taxis, took on May 3rd.

When his robo-taxi needed to make a right turn onto a multi-lane main road, it found the right lane closed off by orange construction cones. The confused vehicle couldn't figure out what to do and called for roadside assistance. But before that assistance arrived, the robo-taxi pulled out onto the road, only to stop again, blocking traffic. It took off again two more times before the roadside assistance employee could actually get in the car, take over, and complete the ride.

The confused vehicle shows that there are still many challenges to getting self-driving vehicles to a usable state. If you've been following the GPT-3 saga, you know that you can only access it through OpenAI's API, which is currently only available to those whose requests for use are approved. A while ago, a number of AI researchers and engineers founded Eleuther AI, a group working on open source AI technology.

Among its first endeavors has been creating an open-sourced GPT-like language model called GPT-Neo. The group uses Idle Compute on the TPU Research Cloud, a Google Cloud initiative that supports research projects with the expectation that the results of that research will be shared openly. This March, after months of research and training, the Eleuther AI team released two trained GPT-style language models that can be used for free with the Hugging Face Transformer platform.

They don't perform nearly as well as the largest version of GPT-3 for some tasks, but prove to be a good free alternative. Finally, at its I/O developer conference last Tuesday, Google announced a number of ways it is moving forward with AI. As Vox reports, two of the biggest announcements are in the realm of natural language processing and search. Lambda, which stands for Language Model for Dialogue Applications, makes it easy for AI systems to have more conversational dialogue.

Multitask Unified Model, or MUM, is an AI model that boosts understanding of human questions and improves search. Google aims to have AI systems take on more of the work humans usually do. Rather than having to use multiple queries to answer a series of questions, you could use one more sophisticated question. Google is looking at integrating Lambda into its search engine, voice assistant, and workspace.

MUM, on the other hand, is designed to understand implicit comparisons in a search inquiry, like how to prepare for hiking to different mountains, and provide the most appropriate answer. That's all for this week's news roundup. Stay tuned for a more in-depth discussion of recent events.

Thanks, Daniel. And welcome back listeners. Now that you've had a summary of last week's news, feel free to stick around for more laid back discussion about this news by two AI researchers. I'm Dr. Sharon Zhou, a graduating fourth year PhD student in the machine learning group working with Andrew Ng. I do research on generative models and applying machine learning to medicine and climate.

With me is my co-host. Hi Vera, I'm Andrey Krennikov, a third year PhD student at the Stanford Vision and Learning Lab. I focus mostly on learning algorithms for robotic manipulation and reinforcement learning.

And if you're a regular listener, this week might be a bit interesting because we're changing things up a bit. Unlike our usual kind of process of just talking about a few different news stories, you know, try to change it a bit by having sort of

different flavors of stories every week and, you know, a bit of each thing. So we're going to talk about AI news regarding research, AI news regarding applications of AI, and AI news that have to do more with sort of the societal impacts slash ethics of AI. So hopefully it'll be fun to have a bit of each as opposed to sort of a random selection every week.

So we're going to go ahead and start with applications or new AI research. And to start things off, an article from VentureBeat titled GPT-3's free alternative GPT-Neo is something to be excited about. And

And so what is this GPT-Neo? So GPT-Neo is a GPT-3-like model, so huge language model that takes a lot of resources to train, both money and time. And it was created by Eleuther AI, which is a team that was trying to replicate GPT-3 in some sense and make it open source and accessible to everyone.

And so they have succeeded in launching that. And the article basically details that, you know, it has actually it has approximately the same amount of parameters, like billions of parameters. And it is out and open source and you can use it. And what's cool is that the weights are...

released. So you don't actually need to expend a lot of that compute power anymore. And now we can, you know, be greener as we move forward in the AI community and hopefully just have that API. Exactly. And, you know, there is actually a Hugging Face API, which is a popular library for this sort of model. So it's...

kind of very accessible to try this out. The only limitation, of course, is you need the hardware to run it, probably in some sort of cloud instance, which is not cheap. But otherwise, in terms of code, in terms of setting it up, it's much, much easier than it has been to do with scale of language model in the past. Now,

I guess the limitation is that this is still not huge. It's what, 3 billion parameters as opposed to GPT-3's 170 billion or something? Yeah.

So it's a little more limited, but at the same time on their metrics, they show that it's performing pretty well, performing not quite as well as GPT-3 kind of across the board on the quantitative metrics. It's not as successful, but it still is performing impressively. And then qualitatively, what got people so excited about GPT-3 was its ability to do really...

coherent, well thought out completions. So, you know, I can tell you a story about scientists discovering a unicorn. And the neat thing is in this VentureBeat article, they include that from GPT-Neo and it does a pretty good job

So I think quantitatively, this is definitely a step towards GPT-type technology being available to anyone and not just open AI. And furthermore, this group is already working on a larger scale GPT-NeoX, which is more GPT-free. So that's

I think for anyone who has hoped that larger models would not just be proprietary to companies that develop them, this is a pretty good sign. And before we go on, maybe let me just read you a bit of this unicorn story that GPT-3 generated. So as a prompt, and this is a prompt that GPT-3 also had, that was this text.

In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley in the Andes Mountains. Even more surprising to the researchers was the fact that unicorns spoke perfect English. So starting from there, GPT-Neo completed as follows. The only evidence was a single word, hello.

The unicorns are very good listeners, so they learned to speak English from watching humans, said Dr. David Orme, chief executive of the Zoological Society of London in a press release. In fact, they also learned to speak Spanish from watching humans. They said hello to us in both languages. While the discovery is stunning in and of itself, the news gets even more mind-blowing when you consider that the unicorns were there for thousands of years.

They were living fossils, Orm said, which is a fancy way of saying that their origins have yet to be determined. And so on. So it's, you know, very grammatically correct, thematically consistent. You know, it's...

not obvious if you haven't seen this sort of thing before that is generated by AI. And it's very neat that, you know, now this sort of thing is open sourced. The only thing I really, really want is for it to have, instead of saying the only evidence was a single word, hello, like the language element of unicorns, like actually knowing that unicorns, like it's the horn part, you know, that is odd, you know, or that cues us in that it's a unicorn.

That's true. Yeah. As it is, it just treated unicorns as an arbitrary animal. Yeah. And its specialty is that it can speak or learn human languages or something. Yeah. Yeah.

Well, on the topic of text, the next article is about both text and speech. And it's titled Facebook Wave to VecU Learns to Recognize Speech from Unlabeled Data. And this article is also from VentureBeat.

And so basically, Facebook announced today that it had or recently that it had trained an AI model that could do speech recognition without supervision, that is, without telling, you know, telling it that, you know, what this text, what this what this phrase actually was transcribed.

And so that's a big step in the unsupervised speech recognition space in terms of its ability to learn that, because this means that it could leverage a lot more data because a lot more data is not transcribed than is transcribed. There's a lot more speech data out there without transcriptions. So that's pretty exciting. Yeah, I think in particular, this is cool because...

Creating transcriptions of speech is pretty expensive. So, you know, you've had ImageNet type data sets with millions of images for a while, but paying people to listen to audio and describe hours of it is much more difficult or at least more costly and time consuming. So it's harder to scale.

And most of these state-of-the-art models for languages like English use really, really huge data sets of thousands of hours at least, if not more. And not only is that very expensive to make, but also unlike a lot of other problems, there's not really many public data sets for speech recognition. There's a couple of

But even those are pretty limited and much smaller than what these companies have internally. So yeah, I think even though this still won't outperform what we have commercialized,

as a research project, it's pretty impressive because it doesn't mean that for languages where you don't have huge data sets or for companies that want to build something that don't have these data sets, this points to a new technique that can work really well while not requiring the same investment or the same initial starting data. And I think it's just also very neat. Like,

This is not a problem where I've been aware of unsupervised learning. And in general, it seems like unintuitive, I suppose, that you can train a model to go from speech to text without having annotated speech with the corresponding text. So personally, I thought this was pretty neat.

Yeah, I agree. And we looked at the paper, the architecture looks actually fairly simple. Using a GAN, you guys are familiar with that. So yeah, it's not super complex, but hopefully will help with ASR systems, automatic speech recognition systems down the line, especially for, you know, I guess startups coming out that don't have as much data. And even for large companies that do have a lot of data, it could cut a lot of costs too.

Yeah, exactly. And I guess the other thing is we have seen in recent years that self-supervised learning has really proven to be incredibly powerful with things like GPT, where you can just throw data at it without having labels associated with the data. And that could also be the case here, that

you know, since you have so much more and not annotated speech data and unrelated text data, this could actually be the way towards better performance. Ultimately, if this technique can be scaled and worked upon in a similar way to what has happened with language models.

Right. And speaking of speech, I guess we're going to transition to our applications section of this conversation. And the next article is titled A Deep Fake Dubs Could Help Translate Film and TV Without Losing an Actor's Original Performance.

And this article is by The Verge. And so the AI startup Flawless has come out with several demos showing that it can do dubbing and specifically it actually doesn't do the speech. So speech is still dubbed by another voice actor in another language, but they're able to do lip syncing with the mouth.

and make it so that it looks like, you know, the dubber for, let's say, Tom Cruise's new movie can speak over him, but then his lips still match, let's say, the Spanish or the German dub. So that's really cool and exciting. We'll see where that goes. Again, this is using a GAN. So those who are following, this is a visual thing. But of course, the article very much views it as, you know, yeah, deepfake dubs for enabling this type of

technology within entertainment. And I definitely can see it happening. It needs, you know, a quality bump, I believe, but I could see it happening in the future for sure. Yeah, I was pretty impressed by this. Just looking at your demos, I think it's already fairly, you know, impressive. And it's seems like something that, you know, is kind of a no brainer that you'd want to

If you can do it with a reasonable amount of quality, it would just improve the final product when localizing into different countries. And

Looking over videos, I thought it was actually pretty good. Again, this only applies to the lips. It doesn't do the usual deep fake of generally everything. As a result, I think being more focused on this one thing of how people speak, it already looks pretty realistic and pretty good. So

Yeah, this is an application of kind of AI technology entertainment that I wasn't aware of until this article. But now that I've seen it, you know, I would not be surprised if it became very commonplace very soon. Right. Yeah. But on to another application of AI that might be commonplace, maybe already is somewhat commonplace, but also more on the kind of fun side, not quite as serious or...

you know, maybe high investment. We have this article called Quizzing Michelangelo AI from the Florentine.net. So this is a very local story. This isn't like any sort of industry-wide story. It's just something fun we found related to AI. And so the story here is that tech company Coelho approached the Museo dell'Opera del Duomo with the idea of an artificial intelligence Michelangelo.

So basically the idea is now, you know, museums had to invest in much more online presence going on virtual tours and stuff like that over the last year. And so that was kind of the idea here is that if you go to the Duomo Museum's website, there's going to be this Michelangelo AI that will pop up in the bottom right corner of the screen. And this is pretty much just a chat bot.

So the idea is you can ask it things about Michelangelo's life, about the museum and so on. Now, this is not the super advanced chatbot. It's mainly catered to specific questions. It can support up to like 7,000 of them. But still, I think it's pretty cute and kind of a fun thing.

a nice way to show that not everything we are is high stakes sometimes you can just have something kind of uh more on the fun side it's cool that it's seeping into something like museums which i feel like we think about museums as this you know like oh it's about old and ancient relics and then there's something so new about that and of course of course modern art museums are very different and there's new stuff in them all the time but this is like

I don't know, very like we see it as bleeding edge technology and it's cool that it's like it's so integrated with it already and adds like a very artistic flair to to a museum's website and interactive aspect to.

Yeah, exactly. And it's interesting, actually, the article also says that this was sort of initially not what they wanted. So it says here, Vedomo, according to Coelho, the founder and CEO of Francesco, really initially did not accept the idea because it was too modern for them. But then when COVID-19 struck, they needed to modernize. And so they

Now they accepted the idea to try and modernize and accept more kind of technology, which it sounds like we sort of more traditional or more long lasting museums haven't done as much of. It's fun. I actually just went to the website and started talking to it, to Michelangelo AI and

uh it's not too sophisticated or anything but at the same time it is sort of fun of uh you know it tells you uh what do you want to ask and then you can say something simple like when were you born and it gives you a pretty detailed sort of life summary like i was born on monday march 6 1475 four hours before sunrise at the compress not from from arizona and tuscany

So, I don't know. This also makes me wonder if we could have chatbots to ask about Wikipedia stories or something to just query for information instead of just looking it up ourselves.

I'm playing with it too. And it says, look at my self portrait in my late Pieta. Do I look like a greedy and stingy person? And then I asked him, do you eat pizza? And he said, I had a spare and simple diet often eating only a piece of bread.

But when I lived in Rome, I had my brother send me from Florence cheese oil and other Tuscan products. I used to drink wine to get energy when I had to work hard. I did not have particularly refined taste. And then he sent me a picture of himself and said, do I look like a greedy instant? Dude, a selfie is really probably the oldest selfie I've ever seen. Yeah, it's fun that it also kind of carries on a conversation. So yeah.

After I asked it, it also prompted me with, do you know what is one of the most beloved buildings in Florence apart from a cathedral? Yeah, it continues prompting you. That's cool. Yeah, exactly. Very cool. Yeah, and I think it sort of speaks to, you know, I think some art museums sometimes can be a little intimidating or maybe less fun, you know, very, very...

serious self-serious right and uh this this does sort of say or indicate that maybe it would be a nice idea to make the experience of appreciating art also more fun and interactive and you know appeal to the kids that way and so on yes it does it does get appealing to the kids

Or maybe AI researchers, God knows, or there's a difference. I'm not sure. Alrighty. So that's fun. You can look it up if you want to play around with chatbots and that we're going to move on to our last little segment, which is more discussing, uh, not just applications, but sort of societal implications of AI and, and, uh,

you know, recent news stories that are more maybe serious in some sense. Starting with our first one, which is not really a news story. It's a Medium post by Black Lemoine about a history of ethical AI at Google. So it starts with saying that this person has had the privilege of being at Google for six years now.

And this post explains from this person's perspective how the ethical AI team was created by one woman over the course of four years. And that one woman is Margaret Mitchell. And yeah, this is not an overly detailed, like a 10 minute read that basically traces the creation of...

the FQI team and in particular sort of details of the impact that Margaret Mitchell and Timnit Jebru had on it. And of course, the reason this is being written, if you haven't been listening to a podcast because we've discussed a lot, you know, there's been a lot of news in the past year about first Timnit Jebru being fired from Google and then Margaret Mitchell being fired just a few months ago.

And now there's a lot of negative feelings and criticism and general ill will towards Google for basically firing with two people who really built up the ethical AI team and seemingly not having a very good justification for it.

So I found this to be an interesting read, sort of this summary that's really meant to give credit to Margaret Mitchell and Tim Najabru and kind of make their contributions very clear, partially because, of course, Google now will not be their cheerleader. You know, there'll be probably some history to or some efforts to kind of paint over that history and move on.

And that's unfortunate. So it's good that people still exist to be able to tell of their contribution. What do you think about this one, Sharon? Yeah, I mean, it's really meaningful that all these people have come out to not only just support, but also, you know, write something like this and actually put the effort in. Like there is a lot of effort in writing this large blog post. So, yeah.

I think it speaks to who Meg was as a person and just, yeah, how important she was for a lot of people at Google. Definitely a role model for aspiring managers. If people who report to you or feel inspired by you this much, it means something. So I hope she knows that at least.

Yeah. And, and yes to the, yeah, Google will probably want to paint over it in some way. They want to move on. They want to say that they're still doing ethics, but in their own way. I think it's a very different thing now, um, uh, in, in how everything has culminated. Um, yeah. Yeah, yeah, exactly. And, um,

This post does conclude with a reflection on the state of the team now. It ends on a somewhat positive note. It says that hopefully ethical AI may someday again be as robust at Google as what it was. Unfortunately, today we are bleeding minds because the best of the best no longer trust Google leadership to make right decisions.

And I've had multiple conversations with other ethical researchers and engineers about whether or not to quit.

Many have considered it, and then it goes on to say that those who are staying are there to make sure that all the work that Margaret Mitchell did will not be for nothing. And then, yeah, so they're staying to be at Google to do the work, to build on Margaret Mitchell's legacy of having built up a team

And, yeah, it says that hopefully the connections and expertise built over the past four years is sufficiently robust to weather this storm, which is, I think, a nice message to send at this point. You know, obviously, I think for many people, especially in ethical AI, this whole thing was pretty shocking. I mean, it's really, you don't see these sorts of very high profile, you know,

firings of real leading experts in AI. And so this was really a huge deal. And it's nice to see that people are sort of looking forward as well and saying that, you know, we can rebuild and keep going despite what's happening.

Right. And on the topic of, you know, ethics and just societal impact of AI, our last article is from the Twitter blog.

titled Sharing Learnings About Our Image Cropping Algorithm. And this was in light of a lot of activity on Twitter about how the automatic cropping algorithm of images. So like you post an image, but it's too big to look at. And Twitter automatically crops it, finds the best place to crop it with their AI algorithm, and then has people look at that through the feed. But what people found was

You know, there's some controversy, right? Because it's like, oh, why would they always crop this person versus that person? And so Twitter actually did a study and found that between men and women, there was an 8% difference of age.

from a demographic parity in favor of women. Comparing black and white individuals, there was a 4% difference in favor of white individuals. And looking at comparisons between black and white women, there was a 7% difference in favor of white women. And finally, black and white men, there's a 2% difference in favor of white men. So there are clear leanings. The percentages are, of course, small, but that's still...

That still, in absolute terms, does mean a decent amount of images on Twitter. And Twitter now has launched a feature that basically is just saying, hey, we will let you, as a user control, where to crop. So you can still decide. And we won't necessarily force it to be one way or another.

And I remember talking to someone and they said, why wasn't this the original thing? Like, why do we have to go in circles for this to be it? But I think it's that it felt like it would be simpler if the user didn't have to see controls or something like that. But yeah, I'm glad Twitter reacted well to the discussions on their platform and did some studies with data and have improved the experience.

Yeah, exactly. I think just about everyone that I've seen react to us on Twitter about this blog post likewise said that it's really commendable how they approached this in terms of there was controversy about it. At the time in last October, people were posting posts

you know, basically experimenting and showing that under some conditions it would crop to show, you know, a white person instead of Obama or things like that. And, um,

At that point, it was all very sort of ad hoc. It wasn't really a study per se, but they took those concerns in mind and did this proper study and then found results and then concluded and changed the service. And I agree that, you know, probably they did that originally used this machine learning thing to make posting quicker, to, you know,

you know, not, not, uh, force people to worry about cropping, to make sure that all the photos are the same size. But, um, yeah, it is commendable that they went back on a decision, you know, figured out that, uh, in some cases it's good to just not have an algorithm make any decisions or make decisions and allow a person to be in control. Yeah. Kind of like a, I feel good story really for AI and, uh,

you know, a good example of what a good ethical AI team within a company should be able to do. Right. So nice way to end this episode, I suppose, with a pretty positive outcome. Oh, yeah, we love that.

And with that, thank you so much for listening to this week's episode of SkyNet Today's Let's Talk AI podcast. You can find the articles we discussed here today and subscribe to our weekly newsletter with similar ones at skynetoday.com. Subscribe to us wherever you get your podcasts and don't forget to leave us a rating and a review if you like the show. Be sure to tune in next week. All right.