We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Is GPT-3 Dissapointing, Killer Robots (?), the AI Hall of Shame

2021/6/10

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Andrey Karenkov

Daniel Bashir

Sharon Zhou

Topics

Andrey Karenkov：一篇博客文章批评GPT-3模型，认为其只是GPT-2的扩大版，缺乏概念上的突破，但忽略了其作为少样本学习器的意义。GPT-3的少样本学习能力是一种新现象，值得关注和研究。虽然GPT-3没有概念上的突破，但其在语言模型领域的改进是显著的，改变了人们对语言模型的看法。许多AI研究中存在使用验证集或测试集进行模型选择的问题，这会造成模型过拟合。在应用机器学习的领域，经常会不恰当地使用验证集或测试集进行模型选择，这会导致不切实际的结果。强化学习领域中，经常会使用测试集进行模型选择。 Sharon Zhou：同意Andrey Karenkov的观点，少样本学习是GPT-3模型的重要特性。自动驾驶飞机技术的发展是必然趋势，因为许多飞机已经具备部分自动驾驶功能。与自动驾驶汽车相比，自动驾驶飞机的安全要求更高，因为飞机失事的后果更为严重。自动驾驶卡车技术已经取得了显著进展，但在安全性和监管方面仍存在一些问题。自动驾驶卡车上的驾驶员可能在行驶过程中休息，这可能会带来安全隐患。与城市道路相比，高速公路更适合自动驾驶卡车的应用。

Deep Dive

Chapters

Discussion on a blog post criticizing GPT-3 for not offering significant advancements over GPT-2, focusing on the disappointment in its few-shot learning capabilities.

Shownotes Transcript

Translations:

中文

Hello and welcome to Scanning Today's Let's Talk AI podcast, where you can hear from AI researchers what's going on with AI. This is our latest Last Week in AI episode, in which you get quick summaries and discussion of some of last week's most interesting AI news. I'm Andrey Karenkov, a third-year PhD student at the Stanford Vision and Learning Lab. I focus mostly on learning algorithms for robotic manipulation. And with me is my co-host...

Hi, I'm Dr. Sharon Zhou, a graduating fourth year PhD student in the machine learning group working with Andrew Eng. I do research on generative models and applying machine learning to medicine and climate.

And for our first article today on research, it is titled GPD-3, a disappointing paper. And this was a blog post, greaterwrong.com, that kind of went through the GPD-3 paper and talked about why it was so disappointing for them. And so this was an...

self-acclaimed enthusiastic user of GPD-2 who wrote a lot about GPD-2, but was really disappointed by GPD-3 because it just felt like it was a bigger GPD-2. And to GPD-3's credit, I would say that it is. And they do say that in the paper quite point blankly. But what are your thoughts on this, Andre? Any points that really jumped out at you in this blog?

Yeah, I was a bit disappointed with this blog post because it's always fun to hear people be very critical sometimes. But yeah, as you said here, the gist of the disappointment was just that, you know, apparently they wanted more than just more GPT-2. And I think...

It's weird to some extent that this person seems to discount kind of the actually interesting part of GPT-3, which is the whole idea that it's a few shot learner and that with just a few kind of examples about updating the weights or optimizing at all, it can do a lot of stuff.

And then, yeah, in this blog post, this person says, I can imagine someone viewing this as very important. And if they thought it showed an ability in transformer language models to pick things up on the fly in an extremely data efficient human like way. Well, I think you should imagine that because that's exactly what people get excited about, including me. This being a fairly novel phenomenon, as far as we could tell. So, yeah.

I don't know. It's a fun read in a way, but it doesn't say anything too insightful, I wouldn't say. Yeah, and I understand the kind of disappointment around GPT-3 not having any conceptual transformations, but it does show qualitative improvements that really just changes how we view language models and do kind of push this inflection point in this field.

But of course, with few shot learning, it's not, you know, it's not perfect. And that kind of brings us to the second article under research, which is titled NYU, Facebook and CFAR present, quote, true few shot learning for language models whose few shot ability they say is overestimated. And so, Andre, do you want to talk a bit about this article? Sure.

Yeah, this I found quite interesting because as we've just said, with GP3 and other models recently, this whole idea that you can just train a model and then give it an example without re-optimizing it at all, just like give it an input and ask it to continue and it can do various things. Like you give it an example of an addition or just its input and then it can continue doing that.

And it seemed to work really well. And then so far, it's been mostly qualitative. There's some efforts to make it more quantitative. And this one basically gave a critical look into how that works. And what they showed is basically people have sort of been cheating. Like, yeah, when testing the...

few-shot capabilities, they were doing model selection on the validation set. So they trained a bunch of different models with different hyperparameters, and they picked out the best one on a validation set. And so here, the team...

has an idea called true few-shot learning, where you can't use any sort of validation dataset. You only have your training set, which you're given. And so you only have the actually few shots that you're given. And it worked a lot worse, right? And it's also hard to choose good prompts. So you might get unexpected results.

So, yeah, I thought this was quite interesting. And I think this is a nice example of how, you know, even if often things that are found in AI are a bit sloppy or turn out to be not quite what people thought, you know, we are being critical of each other. And then, you know, there's papers like this that

do point out where kind of the community has misconceptions and is, you know, doing science wrong and, you know, tries to correct for it. So, yeah, I think it's quite neat. And I think in general, you know, with a lot of these models and setups, I see a lot of trends

on the test set type situations. And this isn't, you know, it's not quite as extreme, this few shot learning before this true few shot learning, but it is, you know, using the validation set in a way that makes it kind of unfair and not a realistic scenario. And I do see this elsewhere, especially in areas that are applying machine learning and they think that it's okay to just use the validation set or even a test set that's not completely held out to do model selection.

And that's really bad because you are definitely overfit or you're doing something with your test set and that is not necessarily held out or realistic. So I think this is important work moving forward. Yeah, for sure. And I find this pretty funny working in RL because you effectively always use a test set. Yeah.

I'll quote you on that. That's enough. I'm just kidding. Yeah, yeah, yeah. Well, everyone knows it. It's an open secret. Everyone knows, yeah. Yeah, so fun discussion, research still may be questionable. Don't trust any papers in research. That's what we are concluding. You know, everything is questionable. That's just AI for you. Yeah.

But let's move on to beyond research, to real world applications where maybe you, you know, it's a bit more high stakes and you don't want to

quite as much. And we have our first example with this article, Google Ventures Back to Merlin Labs is Building AI That Can Fly Planes. So as the title implies, this is about Merlin Labs, which develops autonomous systems that fly airplanes and which just emerged from stealth with $25 million in funding from Google Ventures and Merlin.

And it says it wants to be the definitive autonomy platform for things that fly. And yeah, it's now at 50 employees, has a dedicated flight facility at the Mojave Air and Spaceport, and apparently its system has already piloted hundreds of unmanned test flights. So, okay, that seems pretty cool. Doesn't make me wonder, you know,

This is an example of where you especially want your system to be reliable and to not break. And this is entirely something that's tough for AI. So, yeah, it's cool, but I do wonder if they can really use AI while still being safe. What do you think, Sharon?

I'm surprised that this is just being announced now. I feel like I've been waiting for this since the self-driving car craze. It just seems obvious that there should be a self-flying plane. And I think for planes, you know, a lot of planes are almost autonomous. You know, they're kind of they're not obviously level five autonomy, but they have a lot of autopilot going on.

And it's the humans are largely some people say, you know, kind of in the loop for a lot of commercial flights compared to, of course, driving where that's not necessarily the case. And so I can definitely see this happening, of course.

We want to be much more careful with the plane. There are a lot more people on board. Of course, with cargo, maybe that's not as bad. That's kind of like a cargo ship or even a truck. But, you know, there are issues with a plane crashing for sure.

So I'm excited to see where this goes is all it is. And it sounds like they have enough runway to work on it for quite some time until they need, let's say, some revenue from government deal or something like that. Yeah, it's always exciting to find a new application of AI. And this is one that I haven't seen. So let's wait and see. And maybe we'll all be able to take cheaper flights eventually.

Speaking of autonomous vehicles, we have another story here about a more traditional kind. As you mentioned, we have a self-driving truck that completes a 950-mile trip 10 hours faster than a human driver.

And so, yeah, this is a story about this company, Too Simple, which is a transportation company focusing on driverless tech for trucks. And it said how 80% of a journey of a long haul truck transporting a load of watermelons from Arizona to Oklahoma, how 80% of a journey was driven by autonomous system with a human at wheel for the other 20%.

And how that, as the title says, was 10 hours faster when a human driver. Yeah, so I guess it's neat to see this example. I do wonder if this is really surprising or new. My impression is these sort of demonstrations have already existed. What do you think, Sharon? Yeah.

I'm actually quite impressed with how it's, you know, nearly level four autonomous or something like that. And it's really getting there and showing a serious promise over humans. I think there is the residual question of, well, there is a human on board who's supposed to take over if something's wrong.

Isn't that person supposed to be taking sleep breaks? Shouldn't the car, the truck be resting as well? And I think, you know, obviously what's kind of most likely happening is the truck is driving just fine on its own and the person is asleep for some of that stretch. So it's...

you know like there will be some safety issues i imagine um but i i can see how this is where things are rolling out much more quickly than uh you know city driving for example uh with people uh dashing about everywhere uh this is just largely empty long stretches of road that are straight hopefully

Yeah, I think it does make more sense to be less strict on highways, right? Straight driving for miles. You could see it being okay with not personally being at the wheel. So probably we'll see a lot more autonomous trucks before autonomous cars out there. It makes sense. Right, right.

And now shifting over to societal, you know, uses of AI and societal implications of AI. Our first article here is on King County is first in the country to ban facial recognition software. And this is from Como News.

So King County, where Seattle, Washington is in, is actually the first county in the U.S. that is banning facial recognition software for good. And, you know, supporters are very much applauding this move.

Um, no government agency in King County, uh, was using facial recognition software. Um, and yep, this basically bans it from this entire County. Uh, and, and the police haven't been using it either. So it wasn't a huge, huge, you know, controversy for them. Uh,

And from what I know about Seattle, I'm not super surprised that they were the first county. I could see them very much supporting something like this. Maybe Portland, their county, I would have expected them to be first maybe. But I'm not super surprised that they've pushed forward on this. Yeah, exactly. I mean...

I guess it's news or cool in the sense that this is the first county that has done it. And I think that's also what has been reported and why it's a big deal. This article does note that right now, Portland, San Francisco and Boston already have similar bans. So I guess on the city level, it's already been a bit of a trend. Now, this is a growing trend.

So it's interesting. Yeah, maybe a lot of dealing with facial recognition will be more local as opposed to federal, which kind of makes sense. I guess policing is often to some extent local. So I can see that being reasonable.

I did find it interesting that there is an exception to the software ban that allows law enforcement or government agencies to comply with the National Child Search Assistance Act. So, you know, for dire cases, I guess you can use facial recognition, but otherwise not.

Yep, that makes sense. And it's also for a certain segment of the population too, for children in particular, for better or for worse, I suppose. Yeah, probably, you know, it seems like they were very thoughtful in how they implemented this, so it makes sense.

And on to another story that is probably for the worst, I would say. Maybe definitely that's positive. We have autonomous robots started killing in war from The Verge. So last week, there's been another publication that declared, based on a UN report from the Libyan civil war, that killer robots may have hunted down humans' autonomy for the first time.

And here, the killer robot is this Kargu 2 system that is a quadcopter that is built in Turkey and is just a consumer drone that has a bomb strapped to it. So it can fly in and it can be manually operated or steer itself using computer vision.

And so there was a paragraph here that notes that retreating forces were subject to continual harassment from the unmanned combat aerial vehicles and lethal autonomous weapons system, and that there were significant casualties as a result. But that's all that was in this report, actually, this one really short bit.

So even though it generated a lot of news articles, actually, this one from a verge took a deep look into it. And right now it seems pretty vague as to if this is really the first time, you know, if it's actually that bad. But still, I think this is definitely a sign of things we could expect more and more, especially with this sort of thing. It seems pretty easy to build now, I would say.

So, Terminator. Yeah, we'll see how this progresses. I think, you know, it all depends on how you also define killer robot. I mean, the killer part might be pretty self-explanatory. Autonomous part is self-explanatory. But the robot part, you know, if it's a gun that's just firing on its own with bullets,

with, uh, either facial recognition or, you know, just person detection or, um, tank detection, like is that, is that killer robot too? And I think we imagine, you know, Arnold Schwarzenegger, uh, more or less, but, uh, it, it, I guess it, it does depend. And based on that changing definition, maybe it's happened earlier. So, uh, who knows, uh, on that, uh,

Yeah, exactly. And this article is pretty extensive. It has a discussion of, you know, the different ideas as to how to define kill robots, the efforts for regulation on AI. So I would say it's definitely a good read for more details. But the short version is...

It's probably not that bad. If you see these headlines, then don't worry about it too much yet. But do be aware that there's no regulation on these things and that might be a problem in the future. And to end things on a lighter note...

The last article that we'll be discussing today is from Wired and it's titled Don't End Up on This Artificial Intelligence Hall of Shame. So there's this AI incident database or AI hall of shame, essentially. And this is hosted by Partnership on AI. And they basically contain...

incidents of just flops of AI, like where AI has just completely failed. And so that includes the security robot that flopped into a fountain, number 68, and Google's photo organizing service, which tagged black people as gorillas, number 16, and

And this role of dishonor is what it's kind of called was started by Sean McGregor. And this he works as a machine learning engineer at Sentient.

And this really highlights, you know, some of the big issues out there in AI and especially within companies. So among the 100 incidents logged so far, 16 involve Google, which is more than any other company, and seven involve Amazon and two involve Microsoft. And so it's just bringing to light some of, you know, the

big incidents in AI where AI has kind of failed and also bringing to light, you know, companies behind that and keeping kind of a tallies, maybe keeping people accountable in some sense. Yeah, I think this is pretty neat. I don't know if this is necessarily useful per se, but yeah,

I do think it often because AI is so young or has been emerging recently, let's say in the past decade and being commercialized and so on. Maybe a lot of these things are unexpected and the engineers or products, people who built these things might not be aware of the ways that systems could break and it can be pretty unexpected. You know,

Of course, you can think there might be bias, but maybe you didn't build your robot to look out for fountains, right? Maybe you didn't think about issues with facial recognition being, I don't know, labeling you as something strange. So...

I could see being useful just for people to be more aware of the weird ways in which AI fails. And I do think it's interesting that there's kind of a lot of metadata. So from these 100 incidents, 16 have involved Google more than any other company and then Amazon S7 and Microsoft too. So maybe another thing about databases beyond just having useful examples is

It is a way to hold big companies accountable to actually make sure these kinds of things don't happen, as opposed to, you know, letting it happen and then try and clean it up. That sort of thing. Exactly. And that's it for us this episode. But be sure to stick around for a few more minutes to get a quick summary of some other cool news stories from our very own newscaster, Daniel Bashir.

First off, on the research side, one of the major barriers to developing a successful AI project is data quality. 85% of AI projects fail, and a recent study revealed that 96% of organizations have problems with training data quality and quantity.

As VentureBeat reports, organizations are discovering that when good data just isn't available, the gap can be filled with synthetic data or artificially generated data. Second, Phys.org reports that researchers from Carnegie Mellon University developed a new process that uses machine learning algorithms to isolate natural products that could be used for developing drugs.

In the business and application side, we have two new challengers to OpenAI. According to Pulse News, South Korea's Naver Corp has unveiled a new Korean-based language model system named HyperClova, which boasts even more parameters than GPT-3. The second challenger comes from China.

As Ping West reports, the Beijing Academy of Artificial Intelligence launched the latest version of its pre-trained deep learning model Wudao that is 10 times the size of GPT-3. Wudao is a multi-modal model trained to tackle both the text and image domains. And finally, a few stories on AI and society.

The infamous facial recognition company Clearview AI, as Forbes writes, has been hit with a barrage of privacy complaints in Europe, which claim the company breached the bloc's strict data protection laws by illegally using personal data. As the EU considers new AI legislation, criticism has come from the states.

as former Google CEO Eric Schmidt has warned that the bloc's AI transparency requirements would be a big setback to Europe. According to Politico, Schmidt claims that regulation would hamper US-Europe cooperation in order to compete with China on AI innovation.

And in our final story, Google continues to make public statements about its commitment to ethical AI research in the wake of the departure of star researcher Timnit Gebru. But, as current members of the ethical AI group told Vox Recode, the team has been in a state of limbo, and they have serious doubts that company leaders can rebuild credibility in the academic community or will listen to the group's concerns.

Thanks so much for listening to this week's episode of Skynet Today's Let's Talk AI podcast. You can find the articles we discussed today and subscribe to our weekly newsletter with even more content at skynettoday.com. Don't forget to subscribe to us wherever you get your podcasts and leave us a review if you like the show. Be sure to tune in when we return next week.

Is GPT-3 Dissapointing, Killer Robots (?), the AI Hall of Shame 25:27 Share

Last Week in AI

Deep Dive

Shownotes Transcript

Is GPT-3 Dissapointing, Killer Robots (?), the AI Hall of Shame