We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

ACM on Facial Recognition, National AI Cloud, and Positive DeepFakes

2020/7/10

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Andrey Kurenkov

Sharon Zhou

Topics

Andrey Kurenkov和Sharon Zhou讨论了ACM关于暂停使用面部识别技术的呼吁，以及这一呼吁背后的原因和意义。他们认为，这一举动体现了AI领域对技术伦理问题的日益关注，并呼吁在制定相关法律法规之前暂停使用面部识别技术，以确保其公平性和准确性。他们还分析了MIT撤下有偏见数据集的事件，强调了AI训练数据质量对模型公平性的重要性，并讨论了如何改进数据集的构建和过滤流程，以减少偏见和歧视。 Andrey Kurenkov和Sharon Zhou还探讨了国家AI研究云的倡议，认为这将有助于弥合学术界和大型科技公司在计算资源方面的差距，促进AI研究的公平发展。他们也对国家AI研究云的实际实施和资源分配提出了疑问和担忧。最后，他们讨论了深度伪造技术的潜在积极应用，例如保护弱势群体的身份和隐私，并指出深度伪造技术的双面性，以及在监管方面面临的挑战。

Deep Dive

Chapters

The ACM calls for a temporary suspension of facial recognition use by governments and businesses until regulations and accuracy standards are established.

Shownotes Transcript

Translations:

中文

Hello, and welcome to SkyNet Today's Let's Talk AI podcast, where you can hear from AI researchers about what's actually going on with AI and what is just clickbait headlines. I am Andrey Kornikov, a third-year PhD student at the Stanford Vision and Learning Lab. I focus mostly on learning algorithms for robotic manipulation in my research. And with me is my co-host...

I'm Sharon, a third year PhD student in the Machine Learning group working with Andrew Ng. I do research on generative models, improving generalization of neural networks, and applying machine learning to tackling the climate crisis.

And we have another set of stories from last week to discuss. Quite an interesting set of diverse topics this week. So we're just going to dive straight in, starting with the first one with an article titled ACM Calls for Governments and Businesses to Stop Using Facial Recognition. And this was covered in this case by VentureBeat.

And this is basically what it sounds like from the title of the ACM or the Association for Computing Machinery, which is a very large association for professionals working with computers. So this is like programmers, also in some case, computer engineers.

So this organization released a statement urging lawmakers to immediately suspend use of facial recognition by businesses and governments. And this was released on June 30. And if you've listened to the last couple episodes we put out, you might already know why. There's been a lot of stuff going on with facial recognition. There were companies who themselves decided to stop doing it, like IBM and Amazon, and

And there were statements by the ACLU and the Algorithm of Justice League. And so this statement by the ACM follows on all of those events.

Going into the specifics a bit more, this letter actually doesn't call for a permanent ban on facial recognition, but a temporary moratorium until the accuracy standards for race and gender performance, essentially these sensitive factors, as well as laws and regulations can be put in place. So essentially once we have regulation in place.

Yeah, so this is pretty exciting. I think many people working on AI are in the ACM or involved in the ACM. And it shows that, I guess, all of this discussion, all these events are making inroads to basically where we are, to researchers and engineers. And kind of the overall awareness and decision-making on this front is huge.

becoming more common, which I think is definitely a welcome development. Let's hope this is a continuation of a trend that will become more and more prominent and widespread.

For our next article, it's titled MIT Apologizes Permanently Pulls Offline Huge Dataset That Taught AI Systems to Use Racist Misogynistic Slurs. A spicy title. So in summary, MIT actually took offline a highly cited dataset called 80 Million Tiny Images, which was created in 2008.

And this dataset trained a lot of AI systems to quote by the article, potentially describe people using racist, misogynistic and other problematic terms.

So the training dataset was used to train models to identify and list people and objects in still images, very similar to ImageNet. And it includes images and labels describing the content of those images. And researchers found that thousands of those images were actually labeled with racist slurs for black and Asian people in the database.

For example, pictures of black people and monkeys were actually labeled with the N-word. Women in bikinis or holding their children were labeled whores. And parts of the anatomy labeled with just very, very crude terms and et cetera, essentially. So needlessly linking everyday imagery to slurs and offensive language, essentially

essentially baking in that prejudice and bias into future models is bad. It's very, very bad. And so this data set that was collected in 2008 probably was not sifted through very clearly. And if it were, it's definitely not okay now. Yeah, definitely. So this happened after...

This paper actually found these problems in a data set, and this was done by Vinay Prabhu, who is a chief scientist at UnifyID, a privacy startup in Silicon Valley, and also, I think, Abiba Birhanne, I'm not sure how to say that, a PhD candidate at University College of Dublin in Ireland.

actually release the paper analyzing this data set and pointing out all these issues.

And this is following on a previous event where there was a project titled ImageNet Roulette, which highlighted that in the entire ImageNet dataset, there are also some problematic labels. So I think this is kind of pointing the way for the AI community to be a little more careful about how the data sets are constructed, how data sets are filtered,

and generally what is in them going forward. Yes, this is very, very important and very concerning. And I've definitely looked through the ImageNet database. I haven't seen anything, I guess, offensive, though I've seen things that are definitely incorrect or a little bit inappropriate, actually. Some of the images seem a little bit inappropriate. So, yeah.

Yes, I definitely think we need to be much more careful in curating datasets, which I believe from last week's episode is what Jan LeCun suggested. Of course, we need to do much more than that, but it does look like the datasets are really off, especially if these are forming the basis of benchmarks that we're building our AI models for and towards. Yeah, and I guess one thing to point out also is that

This was created in 2008, this tiny images dataset, which was pretty early relative to kind of datasets as a whole. So ImageNet was created around then, and I think it was released something like 2006, 2007.

And it was pretty much the first dataset of its scale. I think it was the first attempt at a dataset with millions of images and millions of labels. And it was maybe the first attempt at scaling up to creating such a thing. So they had to sort of make up a procedure and that involves scraping images and crowdsourcing labeling and

And hopefully since then, as we have created many, many more new data sets and has become quite common to do for development, hopefully you sort of learn how to have better best practices. And people have realized that you need to be more careful. So part of why this happened might be because of this data set being a little bit older and before it was that common to create data sets of the scale.

And also I think another silver lining here is that the paper that pointed out the issue in the first place did come out. So there is expanding introspection by the community and expanding awareness that these are potential issues. And very quickly after this preprint was released and this paper was titled

It's titled Large Image Datasets, a PureHIC win for computer vision. So after that was released on July 1st, very quickly there was this response of pulling down the dataset. So it does show that at least the community is trying to look out for these issues and address them as they come up.

But to move on to a slightly less anxiety inducing conversation, we have our next topic, which is on the topic of the National AI Research Cloud. So the article we're talking about is titled AWS, Google and Mozilla back National AI Research Cloud Bill in Congress. And this one is from VentureBeat.

And it describes how a total of 20 organizations, including AWS, Google, and VIA, and more joined schools like Stanford and Ohio State in calling for a national AI research cloud that would allow researchers in academia to access compute and datasets

that are currently only available to huge companies like Google. So there's this strange imbalance where companies like Google and Facebook have massive, massive computing resources and run experiments that are truly astounding. And even larger universities like Stanford or Berkeley or other universities that have computational resources still have much less resources.

So this initiative is basically proposing that there be a national cloud so that researchers who don't have such resources can access that scale and do that kind of research.

I wonder, Sharon, how much compute do you use? Have you had to try and scale up to any massive amount of computation at all? Or has it mostly been like single GPUs? Oh, man, what a great question to ask at a great time. I actually just built my own deep learning rig because I can't handle how bad the compute is.

But we do have 100 GPUs. They are from five years ago, though, so they're not high. I wouldn't say they're as high performing as nearly as high performing as the ones now. So, yeah, this is really exciting. I'm curious how it'll go. Of course, I think it sounds very promising and promising.

I think it could actually help with driving research forward as I imagine that this would be, this would largely be backing researchers from maybe like government grants or credits in some way to use this cloud. However, I am also a bit concerned

a little bit skeptical and how it's going to work logistically and in terms of the usability of uh the government creating some kind of cloud interface does frighten me uh and i wonder how they would manage that especially when all the different cloud players are part of this so whose platform do you usually would you use or are you going to create your own and both of those sound um

Like there will be different types of roadblocks. One will be technical if the government creates their own cloud and the other one sounds a bit political among the different cloud providers. So I'm curious to learn more about where this will head next.

Yeah. And but the article does say, you know, leaders at Stanford joined actually more than 20 other universities in sending a joint letter about this to President Trump in Congress last year, backing this national research cloud.

Yeah, from our perspective, I think it's fun to note that actually the directors of the Stanford Human-Centered AI Institute, Dr. Fei-Fei Li and Dr. John Echemendi, were the first to propose it. And I guess my hope is if academia and industry and people who basically need this sort of cloud

form the push and started it. Hopefully, if it does get implemented, then those parties are present to make it usable and make it useful. And I definitely do sympathize with you in that I do have access to a compute cluster with some amount of GPUs and some amount of CPUs. And I've had to scale up to many machines

in parallel for my research. And I've been able to get by with what I have, but at the same time, I've definitely also hit, you know, kind of the upper limit on what I can use

And I had to sort of scrape together my own set of scripts and techniques for using the cluster. So maybe if this does happen, you know, there will become more of a standardized interface and more of a kind of support for a particular way to launch machine learning and AI on large cloud computing. Yeah, potentially this will be it. So we shall see. I wonder what's going to happen close to deadlines. Yeah.

Yeah, no, there are going to be a lot of issues if many universities are trying to use this at once. How are you going to try and allocate resources? It's hard enough to do it within one university or one set of labs. So I guess, yeah, let's see. And let's hope that we do pull through with something that we can use to do more exciting research.

So our last article sounds counterintuitive. It is from Box titled, How Deep Fakes Could Actually Do Some Good.

So as a reminder, deep fakes are AI generative models that are able to essentially create fake looking images, fake looking videos, even fake sounding audio. So essentially, they can create this mask and be completely fake. They can create fake people from images.

And the summary from this article was that the LGBTQ population in Chechnya actually faces quite significant persecution. And there's a new HBO documentary titled Welcome to Chechnya that lets survivors actually share their stories by using deepfake like technology to conceal the survivors identities.

And they do this by overlaying volunteers' voices onto survivors' faces for the camera. And so the goal is to retain the emotion of the subjects who are speaking. So actions like blinking and changing their jaw and having them smile or be very sad, but also communicate what they're saying.

And so this is a huge push for people to use deep fake avatars to essentially cloak themselves. And there are startups doing this, like DID and Aletheia AI.

And this is a possible emerging use case or industry for synthetic media, essentially enabling anonymity. And this will likely impact how we also regulate deepfakes, if it can actually be used for good in some sense. Andre, what do you think of this? I think, I mean, this is pretty cool, obviously. And it does speak to an argument that some researchers have had, which is like, why do you even develop deepfakes?

the technology or why do we democratize the ability to make deepfakes? And one of the cases has been that, well, they are actually very useful applications of this technology that aren't negative. And I think this is a great example of one of them. I was also curious or found it interesting to see another article

titled Deepfakes are Becoming the Hot New Corporate Training Tool.

from Wired, which had a sort of related topic of how companies are now using kind of deep fake-esque technology for, let's say, harmless use cases. So creating images for marketing purposes with greater diversity that maybe small companies cannot afford because you would have to hire many different people, something like that.

In combination, I think these are interesting articles showcasing some positive applications of this otherwise, let's say, I don't know, scary sounding technology.

Right. Definitely. I think up until now, deepfakes, people have approached with caution, obviously showcasing its worst side through political manipulation. But I think these applications definitely start to shine through as enabling voices to speak out. But

but also enabling people to, to maintain that anonymity, um, especially if it's for their safety. Uh, and I think that is very powerful. Uh, obviously it's a fine line. So we'll be thinking about this as we move forward with regulation in this space. Uh, yeah, but it is very promising and it is, um,

Very great that this is definitely a way forward. And actually, in the article, they point out that women have used virtual masks on Snapchat, some of which are powered by AI generative models, of course, also very powerful graphics models to share their experience of sexual assaults through video without revealing their identities. And I think that is also a really powerful application.

Yeah, yeah. To your point of needing to take this into account, into regulation, I also think it's interesting that the article itself also notes that currently there is no law, no federal law regulating production of deepfakes, although people are already looking into it and thinking about it. And so...

This article, to quote it, it says, "As technology becomes more prominent, we should expect more people to argue for legitimate use cases or, at the very least, applications that are not as terrifying as the defects we were more familiar with. And that will inevitably complicate how we choose to regulate them."

So yeah, on the whole, clearly this is positive. And it also points to the difficulty of reckoning with the negative applications. You can't just detect any deepfake and kind of delete it, maybe because there are useful instances of using it. But at least it's good that we can see some positive applications of AI and not more sort of scary dystopian stuff as we've been talking about lately.

Right, definitely. And of course, um, anonymity has always been a double-edged sword. So, uh, anonymous chat apps, rooms have always, uh, I've seen been very good for some groups and of course, um, and in a safe space for many people, but also, uh, has enabled quite a bit of harassment and abuse, um, across the internet and has been quite toxic as well. So, uh,

It definitely is a fine line of safety and also security, I would say. So obviously no laws yet, but this does make the line more gray. And I'm glad people are looking into significantly more positive applications of this technology.

And on that note, thank you so much for listening to this week's episode of Skynet Today's Let's Talk AI podcast. You can find the articles we discussed here today and subscribe to our weekly newsletter with similar ones at skynettoday.com. Subscribe to us wherever you get your podcasts and don't forget to leave us a rating if you like the show. Be sure to tune in next week.

ACM on Facial Recognition, National AI Cloud, and Positive DeepFakes 21:31 Share

Last Week in AI

Deep Dive

Shownotes Transcript

ACM on Facial Recognition, National AI Cloud, and Positive DeepFakes