We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Boston Dynamics, DeepFake Amazon Workers, Systematic Labeling Errors

2021/4/11

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Andrey Kronikov

Daniel Bashir

Sharon

国际仲裁专家，擅长复杂争端解决。

Topics

Daniel Bashir: 本周的AI新闻包括波士顿动力公司发布的新型机器人Stretch，旨在提高物流效率；OpenAI CEO Sam Altman预测AI将在十年内创造足够的财富，每年向每个美国成年人支付13500美元，但这引发了关于权力集中和社会不平等的担忧；MIT的一项研究发现，最常用的十大AI数据集存在大量的标签错误，这扭曲了对该领域进展的理解；一项研究发现，最先进的自动语音识别系统难以识别某些地区的人的口音，这表明存在偏见。 Andrey Kronikov: Google AI研究经理Samy Bengio辞职，这与Google此前解雇两名伦理AI团队领导者有关，这反映了Google在处理伦理AI问题上的困境。60分钟的视频报道了波士顿动力公司的机器人技术，展示了其机器人的各种能力，包括Atlas和Spot。关于Sam Altman的普遍基本收入预测，其文章内容与标题有所出入，实际上建议通过对公司和土地征税来资助普遍基本收入。 Sharon: Samy Bengio在Timnit Gebru被解雇时并不知情，这表明Google内部沟通不畅，以及对伦理问题的漠视。Samy Bengio的离职凸显了Google在伦理AI方面的问题，并可能导致更多人才流失。在亚马逊工会投票前夕，出现了一些使用深度伪造照片的Twitter机器人账号，发布了对亚马逊的赞美之词，这引发了人们对虚假信息和网络宣传的担忧。大型模型参数更多，更容易记住数据，包括错误的数据，因此泛化能力可能较差。数据集的创建对于AI模型的性能至关重要，数据集中的偏差和错误都会影响模型的公平性和准确性。虽然众包技术有所改进，但并非所有研究人员都采用最佳实践，这导致数据集质量参差不齐。

Deep Dive

Chapters

This chapter covers recent developments in AI, including new robots from Boston Dynamics, universal basic income predictions by OpenAI's CEO, systematic labeling errors in AI datasets, and biases in speech recognition systems.

Shownotes Transcript

Translations:

中文

Hello and welcome to Sky News Today's Let's Talk AI podcast, where you can hear from AI researchers about what's actually going on with AI and what is just clickbait headlines. This is our latest Last Week in AI episode, in which you can get a quick digest of last week's AI news, as well as a bit of discussion between two AI researchers as to what we think about this news.

To start things off, we'll hand it off to Daniel Bashir to summarize what happened in AI last week. We'll be back in just a few minutes to dive deeper into these stories and give our takes. Hello, this is Daniel Bashir here with our weekly news summary. This week, we'll look at a new Boston Dynamics robot, UBI, error-laden datasets, and bias in speech recognition.

Boston Dynamics is already well known for Spot, its robot dog designed to work in a range of environments. In addition to research, the company has been focused on logistics recently. As The Verge reports, the lab released a new robot named Stretch on March 29th. Rather than being bolted in one place with a workflow centered around it, Stretch is designed to slide into existing workplaces and load or unload goods.

According to Michael Perry, Boston Dynamics VP of Business Development, the robot could allow the lab to target customers who would otherwise avoid automation as too expensive or too time-consuming to integrate. Boston Dynamics claims Stretch can move up to 800 cases an hour, comparable to the throughput of a human employee.

If you've been following recent political debates, you've probably noticed increased mentions of universal basic income and similar mechanisms for helping people deal with automation-induced job loss. CNBC reports that in a recent blog post, OpenAI CEO Sam Altman wrote that in a decade, AI could generate enough wealth to pay every adult in the US $13,500 a year.

But critics are concerned that Altman's view of this potential future could do harm, because it envisions a world in which all non-AI companies are run out of business and even a fraction of OpenAI's income could provide the income for every American citizen. Of course, this was only a blog post and Altman's words were intended to be a conversation starter.

As reported by the MIT Technology Review, a new study from MIT has found that the top 10 most cited AI datasets are riddled with label errors. Given that we evaluate the accuracy of machine learning models against labeled data, this certainly makes it difficult to know whether models with high accuracy are actually doing what we want them to. And the article claims, "This is distorting our understanding of the field's progress."

Because some of the most important datasets in AI, including ImageNet, contain some problematic errors and others that are flat out wrong, this may have some important implications for the field.

The researchers found that among 34 models that had been measured on ImageNet, those that didn't perform so well on the original incorrect labels were some of the best performers once the labels were corrected, and that simple models seemed to fare better than more complex ones on the corrected data.

Finally, a new study from the University of Amsterdam, the Netherlands Cancer Institute, and the Delft University of Technology found that even state-of-the-art automatic speech recognition systems struggle to recognize the accents of people from certain regions of the world. As VentureBeat reports, they found that an ASR system for Dutch recognized speakers of specific age groups, genders, and countries of origin better than others.

The researchers found that the ASR system they evaluated recognized female speech more reliably than male speech regardless of speaking style, and that it struggled to recognize speech from older people as compared to younger. It also had an easier time detecting speech from native Dutch speakers than from non-native Dutch speakers.

The researchers point out that it's to an extent impossible to remove the bias creeping into datasets, but one solution may be mitigating bias at the algorithmic level. That's all for this week's news roundup. Stay tuned for a more in-depth discussion of recent events. Thanks Daniel and welcome back listeners. Now that you've had a summary of last week's news, feel free to stick around for a more laid-back discussion about this news by two AI researchers.

I am Andrey Kronikov, a third year PhD student at the Stanford Vision and Learning Lab. I focus mostly on learning algorithms for robotic manipulation and reinforcement learning. And with me is my co-host.

Hi, I'm Sharon. I'm a fourth year PhD student in the machine learning group working with Andrew Ng and I have defended successfully. I do research on generative models mainly as well as on improving generalization of neural networks and applying machine learning to tackling the climate crisis as well as to medicine. Yeah, as you noted, we've taken a bit of a break of a podcast and a lot of exciting things have happened. So maybe you can share...

what defending means. You're not a PhD candidate, right? You're done.

I'm done. I guess I'm a PhD almost graduate after I accrue the right amount of credits. But yeah, I'm basically there, which is very exciting. So a defense is you got to prepare a sword and you got to fight all these snakes that your committee lashes out at you. And depending on how good you've been as a PhD student,

That determines the size of the snake as well as its violence. Okay, no, more seriously, you give a presentation and then they grill you with questions, which actually wasn't horrible, you know, like a private room. And then there's like also a public setting as well. And

Yeah, it's supposed to culminate like your entire PhD is supposed to culminate in this moment, really. And you talk about your research and the story and you somehow weave together a story. And and that's that's what it is. And then the written form of that is is your thesis that you then submit afterwards. Or in my case, I guess I submitted it before. But yeah, the that is.

That is it. I am free. Free to do other AI related activities. Maybe no more research for you, but you're still an expert. So you're still fit to co-host this podcast. I mean, don't have to be PhD students to talk about it. Right.

I'm looking forward to doing my defense sometime soon. Well, within a year maybe, you know. These things take four, five, six years. So it's a journey for PhD. Yes, it is definitely a journey. Yes. All right. Well, congratulations, Sharon. Very exciting to hear your news.

And listeners, we are excited to be back and get back to weekly discussion of AI news. To start things off, we have our first piece of news, which is that Google AI Research Manager quits after two ousts from group. So this is about Sammy Bengio, who is kind of a huge figure in AI.

And he managed hundreds of researchers in the Google Brain team and announced recently that he is no longer going to be with Google. He is quitting and will be out of the company by April 28th and will be looking for other opportunities.

This follows on the heels of kind of a tumultuous period for Google, where two of the leads and their ethical AI team, Timnit Jebru and Margaret Mitchell. First, Jebru was fired infamously. And then two months later, Margaret Mitchell was also fired after trying to fight back and

basically criticizing the leadership. And all indications seem to be that Samy Banjo here has been interviewing and thinking about other positions and now has resigned because of these events. It's kind of big news. Huge news, yeah.

Based on what I've heard internally, it shouldn't be surprising, though, because he was extremely dismayed and he was also blindsided when they fired Timnit. Like, he didn't know at all until Timnit asked him why she was locked out. And so, yeah, I think in many ways...

It's not surprising that Sammy has left. It's very sad that he's left. And it just straight up shows that they're not doing ethics. Like basically the story, the saga continues and it's going to continue for a while. I think Margaret Mitchell did mention, you know, it's only going to continue, like continue onwards because other companies are now just poaching these people and these people are easy to poach, you know, like.

Yeah, because they're already kind of looking around. So it takes time to move to a separate company, and so that's kind of what's happening right now. Yeah, this really makes you wonder how recruiting will work for Google Brain. Definitely, they are still one of the bigger labs, and they have really nice things to offer. But for fresh PhD graduates, I think this whole episode, even if you're not researching ethics...

will really make you question whether it's the right company to work for or whether you should take one of the many other options available now. Right. Exactly.

Well, onto a more cheery topic. Our next article is titled Robots of the Future at Boston Dynamics. And I believe it's actually a video from 60 Minutes. And it is about Boston Dynamics and all the crazy different things that their robots can do.

So this includes the humanoid robot Atlas that can run, spin, and jump. And it can also make some of those mechanical decisions autonomously, like staying balanced when it's asked to run.

It also features Spot, which is their robot dog, and that can be driven around. It can climb over steps, over rocks, and it knows how to balance really well as well, which has been a huge issue in robotics. And so it's a very, really impressive robot. And it's actually on the market for about $75,000 a piece. And police departments are apparently using them to assist with investigations.

So this is a nice kind of video. And I believe at the very end, they also showed the dancing robot video, which you might have already seen. But it's a it's a very nice overview of Boston Dynamics.

Exactly. Yeah. There's been a lot of viral content from Boston Dynamics. I'm sure most of the listeners are aware of the company, but this is a nice sort of mini documentary that goes a bit more into how the company is run, how these robots work, what powers them. And I think it might contradict some of the assumptions people make and maybe make people less scared of these robots.

So definitely a fun watch. It's titled Robots of the Future at Boston Dynamics. It's free on YouTube. So if you're a fan of robotics or Boston Dynamics, you can check it out for a lot of footage of these robots and the people who make them too.

Okay, nice and short there. Let's move on to the next article. Silicon Valley leaders think AI will fund cash handouts.

And this is a prediction from the CEO of OpenAI, Sam Altman, who wrote in an opinion piece that in as little as 10 years, AI could generate enough wealth to pay every adult in the US $13,500 a year. And that's in the article Moore's Law for Everything.

There were others who responded. For instance, Glenn Whale, an economist and principal researcher at Microsoft Research, wrote, "...this beautifully epitomizes the AI ideology that I believe is the most dangerous force in the world today."

Another industry source told CNBC that Altman envisions a world wherein he and his AI CEO peers become so immensely powerful that they run every non-AI company employing people out of business and every American worker to unemployment. So powerful, a percentage of OpenAI's and its peers' income could bankroll universal basic income for every citizen of America.

Yeah, so that's pretty much the summary. This was mainly not many Silicon Valley leaders, although I think many relate to this topic. Let's discuss, Sharon. Not a hint of grandiosity at all. Yeah, this reminds me of, for decades, people have been predicting the singularity, right? Where AI gets super, super smart and then

We have a sort of utopia. And yeah, this is pretty much that idea, but restated. I don't know if you have any thoughts on Singularity or this specifically.

Yeah, I mean, I can see that. I mean, I would be happy if they would actually bankroll UBI. But of course, concentration of power like that is typically not the greatest for everyone, especially now with globalization. And you can see how you can really like see because people's happiness is relative, right, to the people around them. And so if there is such a big disparity between

wealth that you have and wealth that you see others have. Um, because of globalization, because of the internet, it's really easy to see the person at the very top as well. And I think Instagram very much does this. Cause then it's like not even just at the top, it's also just like filtered to be like the best stuff of someone's life. And also sometimes not always real. Right. And so it's like very Photoshopped. So, um, yeah,

I don't know how to feel about this. I mean, I could see these people becoming very wealthy, if not already. I hope they have big hearts. I agree that it's not very different from people in the past, maybe like oil lords and stuff like that acting this way. It feels generational. I don't know. I honestly don't see it as a huge difference. I think one big

One big defining factor of the information age that's interesting is that compute is not a resource that you can, like once you use it, it's gone, like with oil. Right.

So it's like you could use it again. You know, it's like almost recyclable and arguably like, you know, precious metals are like in some ways recyclable, but like because like someone else can also have it. But I think there's something even more here where it's like you can just continue pumping it out on the same machine. And so it is different from something like oil or something like that. So, yeah. Yeah, I think one interesting thing reading a bit more into it here is

is that the article name is a bit misleading in that the actual content of the piece starts off saying that within 10 years, you can actually just...

change the tax structure, so tax companies and land instead of labor. And yeah, it's saying the American equity fund would be capitalized by taxing companies above a certain valuation, 2.5% of their market value each year, payable in shares transferred to a fund, and by taxing 2.5% of the value of all privately held land assets.

So, yeah, actually, this is a fairly straightforward argument for just the universal basic income that would be paid for by taxes. And I guess the first of the pieces that eventually I would make all these companies way more valuable because of what they can do. And then there'll be more money to go around.

I definitely am skeptical that, you know, um, this prediction of AI will fund, um, basic income within 10 years. That's for sure. So I was very skeptical going in. Um, but it seems like your actual prediction is a bit more nuanced and, uh,

Not quite what it says, but still Sam Altman is famously optimistic about AI. Many people doubt that AI will accelerate as quickly as he predicts and as open AI is hoping that it will. Right, exactly. Yeah.

Well, on to a more or less future thinking thing and more what just happened. Our next article from the Technology Review is titled Deep Fake Amazon Workers Sowing Confusion on Twitter.

All right. So there are some deep fake or rather some Twitter bots that are using deep fake, you know, photos to seem, I guess, like real people, as well as potentially synthesized text that are saying, you know, all these compliments about Amazon. And this is right before Amazon.

that landmark vote that could lead to the formation of the first ever labor union at an Amazon warehouse. And so of course, like Jeff Bezos at all, I don't want this to happen. They don't want this union. And, and,

It's very interesting that the company seems to have deployed these deepfake accounts. And then I think people started to pick up on it because it was too obvious that they were fake. And again, the deepfake part is literally just the profile picture, stock photo. So you can grab any stock image, honestly. Yeah.

But the fact that they did this and did it so, you know, over the top, people started picking up on it and started parodying it as well. And so it became quite funny. I did not see this on Twitter since I've been, as I told Andre previously, you know, blocking myself on Twitter to be productive. But I don't know. Did you see it, Andre? And what are your thoughts on this?

No, I haven't. I've also been not very active on Twitter. You know, it's kind of a time sink, so not always useful. Oh, really? No, I'm kidding. Yeah, this is not a very pro-Twitter podcast. But this is a funny story. We've seen deepfake photos be used for profile pictures a couple of times before.

So this is sort of showing that there is a bit of a trend. One thing to note here is that Amazon told the New York Times that it did not set up these accounts. And the journalist or someone investigating this also checked that, and it seems unlikely. And that makes sense, I think, because this would be...

pretty obviously ridiculous for Amazon to do. I mean, they already had this program with real people, right? A few years ago, which was also hilarious and had these, uh, parodies. So to do it again with AI is, um, probably not the best idea. Um, yeah.

Yeah, so pretty funny story, not any serious consequences, obviously, but maybe another sign of how we should worry, given that there is a lot of misinformation and a lot of campaigns on Twitter and AI technology will only make that more widespread going forward. Yeah.

Yup. It is so silly. I, I, yeah, anyways, it just seems so ridiculous. I would be excited though, if we, if you had the party accounts, the humorous accounts, and there are a lot of, you know, funny AI bots on Twitter. So, uh, that's definitely a better application of the powers of AI, I think, uh,

Twitter is also pretty fun. You can follow a lot of AI artists, which I have used before. Oh yeah, that is very fun. Yeah. So yeah, don't follow big companies. Follow artists and creatives for the best applications of AI.

And onto our last piece, we have article MIT study finds systematic labeling errors in popular AI benchmark datasets. So in this new paper and website published by researchers at MIT, there is an analysis of 10 test sets, datasets that include ImageNet,

and other popular data sets and found that on average there were 3.4% erroneous labels across all the data sets. So for instance, there were 2,900 errors in the ImageNet validation set, and then there were 5 million errors in QuickDraw, which is kind of a crowdsourced

And this is really bad to give some perspective because test sets is how we evaluate our algorithms. So we use the train set to optimize the model and use the test set to get a number that says this is how good it is. And

Yeah, so this kind of throws a lot of the results in a lot of papers into doubt. They even showed that when you correct the labeling errors, larger models performed worse than lower capacity counterparts because the larger models reflected the labeling errors, whereas the smaller ones didn't.

And I don't think that's surprising. If you're going to have more parameters, you're probably going to be memorizing more data, right? So it's not going to be... It might not generalize as well because it's literally just memorizing a lot of these data points, especially if a lot of them are probably incorrect. And I think it's just like we are...

still climbing on noise in a sense like we are improving things but we on some benchmark based on this test set but the test set itself is very noisy so like are we really improving it when a paper publishes saying it got 1% better you know like what if it got 1% better on the incorrect stuff like oh that's bad you know

So this is very problematic in AI, I think, especially since we trust a lot of, you know, label, you know, label data. And even when things are checked once or twice, there's still errors. Like it's just, it's very human to have a lot of errors. And so like for medical data sets, we would have, you know, several doctors labeled same thing. And then we would kind of average that to, to reduce noise. But yeah,

Even then, there's still kind of incorrect stuff, arguably. Yeah, exactly. I think these days, data sets are so essential to AI that it's good that there's a lot more research like this that really examines the data sets and scrutinizes them and shows kind of what is in there, which a lot of researchers take for granted.

The good news here is that when you correct the errors, so here they corrected for a smaller dataset and measured AI models trained and tested on the corrected dataset, the results are largely the same.

I feel like there was a paper about that recently, something about that. It was very similar to that paper a couple of years ago on how they got another ImageNet dataset or another CIFAR dataset where they scripted it in a similar way. And they found the results weren't as good, but the ranking and the models was still the same.

Yeah, I do remember that, which was pretty interesting. That was really interesting. I think that's very consistent with these results. Yeah. I think the name of that paper was something like, does do computer vision models transfer from ImageNet to ImageNet or something like that? Yeah, something like do ImageNet models generalize to ImageNet, something like that. It was spicy. It was good. That was a fun paper. Yeah. Yeah.

Yeah, and so yeah, this shows more kind of how important dataset creation is. We've discussed in the past how a lot of datasets have issues of bias and not enough of certain types of people, and that leads to models being worse, typically for non-white people. So just another angle from which

to understand the importance of how to create datasets. I think, I mean, to some extent, this is not surprising also because these labels, because these are such massive datasets are crowdsourced and that always means that they're not completely reliable. And I would imagine that

With newer datasets, people are probably better able to filter out problems and sort of the methodology for crowdsourcing labels has improved in the past decade. Sharon, do you know this? I think you've had some experience with label collecting.

Yes, crowdsourcing techniques have definitely improved. I'm not sure if everyone's employing them, though. This is something I've definitely observed where, you know, there's plenty of research in this space, but it's not like people are reading that research as much or or implement or it's not easy to implement. So, yeah, I think it's I think it's mainly like if Amazon Mechanical Turk or some of these actual platforms were to change technology.

some of the way things are crowdsourced that could significantly benefit everyone who's using it.

Interesting. Well, now there are a lot of startups that aren't Amazon and Mechanical Turk for collecting labels. I think there's a lot of options these days for getting labels. So hopefully some of them are taking notice and hopefully some researchers are also taking notice. Certainly when you start on working on a large data set, you know, you probably are looking for some of the best practices I would imagine.

Right. Exactly. Hopefully. I cannot confidently say that people were doing that. I think people wanted to, you know, label as much stuff as possible. So maybe it was based on heuristics or something like that, but also not on, not based on tiny heuristics either, but like, um, cause if you did add those up, it wouldn't, I don't know, in a research paper, you don't want to be like, Hey, we tweaked it this way. We did that way kind of thing. Um, so yeah.

I can see how these like general concepts were developed. Like all we did was this. And then we, you know, quickly had someone check 5% of the data and it looked fine.

well, you know, at least this is out there now and maybe it'll make some people a little more conscientious and aware of the potential issues with data sets a bit more as we've other research on this. Oh yeah, definitely. Definitely. I'm excited to see where that goes, especially for things that are now starting to, you know, seep into, seep into actual applications such as self-driving or something like that. Um,

And with that, we're going to wrap up. Thank you so much for listening to this week's episode of Skynet Today's Let's Talk AI podcast. We are super happy to be back from our break and resume our weekly schedule. As always, you can find the articles we discussed here today and subscribe to our weekly newsletter with similar ones at skynettoday.com.

Subscribe to us wherever you get your podcasts. And don't forget to leave us a rating and review if you like the show. Be sure to tune in next week.

Boston Dynamics, DeepFake Amazon Workers, Systematic Labeling Errors 31:32 Share

Last Week in AI

Deep Dive

Shownotes Transcript

Boston Dynamics, DeepFake Amazon Workers, Systematic Labeling Errors