We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Hate Speech, Applied AI, NYPD, & Grades

Hate Speech, Applied AI, NYPD, & Grades

2020/8/26
logo of podcast Last Week in AI

Last Week in AI

AI Deep Dive AI Chapters Transcript
People
A
Andrey Kurenkov
D
Danielle Bashir
S
Sharon Zhou
Topics
Danielle Bashir: 本周新闻主要关注 Facebook 利用 AI 检测仇恨言论的挑战,包括图片、模因等混合媒体内容的识别难度;AI 研究社区对现实世界问题的关注不足,以及算法偏差导致的社会不公平问题;纽约市警察局使用面部识别技术引发争议;英国政府使用算法决定 A-level 和 GCSE 考试成绩的失败案例。 Andrey Kurenkov: Facebook 在仇恨言论检测方面取得了显著进展,但仍面临挑战,尤其是在处理模因等混合媒体内容方面。同时,许多 AI 研究人员更关注理论问题,而忽视了现实世界问题的解决。算法偏差问题也需要引起重视。英国政府使用算法决定考试成绩的案例说明,在大型 AI 应用中,需要考虑备选方案和公众意见。 Sharon Zhou: Facebook 的 AI 系统在检测针对少数群体的仇恨言论方面表现较差,这凸显了算法偏差问题。模因是 AI 检测仇恨言论的一大挑战,因为需要同时处理图像和文本信息。纽约市警察局使用面部识别技术搜捕 Black Lives Matter 活动家的事件,其动机和面部识别技术的作用尚不明确。英国政府使用算法决定考试成绩的决定是错误的,因为它忽视了学生的个体情况,并导致了不公平的结果。

Deep Dive

Chapters
This chapter discusses Facebook's efforts to use AI for detecting and removing hate speech and misinformation, highlighting challenges such as the complexity of mixed media content like memes and the AI's uneven performance across different demographics.

Shownotes Transcript

Translations:
中文

Hello and welcome to Scanner Day's Let's Talk AI podcast, where you can hear from AI researchers about what's actually going on with AI and what is just clickbait headlines. This week, we're going to try a slightly new format by starting out with our last week in AI digest of last week's news stories and then following up with a discussion between me and fellow PhD Sharon.

We'd love it if you could give us feedback on this new format over at bit.ly/LTAsurvey. Again, please just fill out this quick survey at bit.ly/LTAsurvey. And now over to Danielle Bashir for our first segment, summarizing the news stories we shall be discussing.

Hello and welcome. This is Daniel Bashir here with Skynet today's Week in AI. This week, we'll look at real-world AI, the NYPD's targeting of a Black Lives Matter protester, and a follow-up on last week's A-level story.

First off, one thing about Facebook we're well aware of is its massive reach. Only in 2017 did Facebook begin to own up to its ability to deliver toxic speech, propaganda, and misinformation to millions of users. Facebook has invested heavily in controlling toxic content, outsourcing content moderation to reviewers around the world. Now, Facebook is working on AI to detect and remove hate speech and misinformation.

Fast Company reports that while Facebook has made significant progress, it has continued to deal with issues. A July 2019 document leaked by NBC News stated that Facebook systems flagged and proactively removed a higher proportion of hate speech posts targeting white people than posts targeting minorities.

But larger challenges lie ahead. Toxic content comes not only in words, but also in images. Facebook's researchers have crafted techniques for detecting toxic images, but they don't work so well, and Facebook's head researcher doubts that current approaches will produce systems that are good at detecting harmful images and efficient enough to run at a large scale.

Mixed media content, such as memes that combine images with language, are becoming the majority of content on Facebook and likely present an even tougher challenge. For all the progress they've made, Facebook's battle with the depths of misinformation is just beginning. They'll have to sprint to keep up.

While Facebook's researchers are doing their best to tackle toxic content, others are living in an ivory tower. Hannah Kerner, assistant research professor at University of Maryland, College Park, writes for the MIT Technology Review that the machine learning community appears to view solving real-world problems as an endeavor of limited significance.

Natural language processing had a similar realization earlier this year. While the goal of AI is to push the frontier of machine intelligence, novel developments tend to manifest as new algorithms or procedures that often produce only incremental improvement on benchmark datasets. This form of progress exhibits flawed scholarship and leaderboard chasing.

Meanwhile, the mere scent of application causes research to be marginalized at top AI conferences. This is a huge problem because machine learning is a promising way to advance health, agriculture, and many other areas. But instead of directing their brainpower toward solving real-world problems, many researchers are stuck in competitions that can feel like games. This is a problematic trend for other reasons too.

Prioritizing performance on benchmarks overlooking at real-world applications allows issues like AI bias to rise unfettered. Benchmark datasets are often not representative of the real world and contain biases that are transferred to the trained models. But researchers focused on leaderboards are unlikely to see these impacts and create systems that will be genuinely helpful to people.

While AI researchers seem to chase the wrong goalpost, people who could use AI to create genuine change and address real issues aren't benefiting as much as they could. The field needs to ask itself once again, what is our goal? And if that goal involves making a concrete, positive impact on the world, then the field may need to rethink its position on applications.

Speaking of applications, the police have plenty. Last week, the NYPD besieged the home of 28-year-old Derek Ingram, a prominent Black Lives Matter activist. Gothamist reports that the NYPD deployed facial recognition technology to find Ingram using a photo from his Instagram page.

Ingram was targeted by officers in riot gear during an hours-long NYPD raid on August 7th after allegedly shouting into a police officer's ear with a bullhorn during a June protest against police brutality. The NYPD's facial identification section uses facial recognition software to identify possible suspects in thousands of cases each year, drawing on a database of mugshots to generate possible matches, which are then analyzed by investigators.

A Buzzfeed investigation also revealed that the NYPD was a frequent user of Clearview, the controversial facial recognition firm that collects millions of photos of Americans without their consent. The NYPD states that they use facial recognition to gather leads on suspects for crimes. In Ingram's case, they say his shouting caused an officer temporary hearing damage and therefore was an assault.

Some find the NYPD's use of facial recognition concerning because it seems to silence dissent, while others believe that Ingram's actions went too far. Ingram's case in particular certainly seems to be a more difficult one, but we will have to continue to contend with the role of facial recognition in policing and how it affects us as citizens.

And finally, a quick follow-up on our story from last week. As you may remember, with students unable to take their end-of-year exams as usual, the British government opted to assign students taking the A-levels and GCSEs their scores using an algorithm. But, instead of considering individual circumstances, that algorithm prioritized maintaining a normal distribution of grades.

The algorithm also took schools' previous performance into account, which could drag down the scores of high achievers. The Guardian reports that Gavin Williamson, British Education Secretary, announced that the government would scrap the standardization model that awarded grades in lieu of exams and caused so much chaos. Apologizing for the distress the model caused, Williamson said that A-levels and GCSE results would revert to teacher-assessed grades.

It's good to hear that the British Department of Education is acting to fix the harms its algorithms caused. Let's just hope that this teaches us a lesson about making sure humans check AI's decisions in the future.

That's all for this week. Thanks so much for listening. Okay, so that was the summary of the stories. Thank you, Daniel, for producing that. And now we are going to start our discussion where we dive a bit deeper into some of the details we find interesting and give the researcher take on these stories. I'm Andrey Kurnikov, a third-year PhD student at the Stanford Vision and Learning Lab. I focus mostly on learning algorithms for robotic manipulation in my research.

And with me is my co-host. I'm Sharon, a third year PhD student in the machine learning group working with Andrew Ng. I do research on generative models and applying machine learning to tackling the climate crisis.

Great. So let's go ahead and dive in to discussing the news stories we had last week, starting with the story Facebook's AI for detecting hate speech is facing its biggest challenge yet, which was in FastCompany.com. So as you heard from Daniel, the story basically covered Facebook's ongoing efforts to

and the need for using AI to detect hate speech. And we're going to dive into a lot of the more interesting bits beyond the basic summary. Something I found interesting was actually the quantity of progress. So in the second quarter of this year, Facebook reports that it took down 104 million pieces of content that violated its community standards.

And then 22 million of those were in the second quarter alone, compared to 9.6 million in the first quarter and compared to just 2.5 from two years ago. So there's a lot of either increasing hate speech or increasing detection and, I don't know, handling of hate speech. How about you, Sharon? What details of this article seemed interesting to you?

One thing that stood out to me was definitely the memes. The fact that memes are actually a really, really big challenge for AI because the machine actually has to look at both, well, has to look at the textual content of that meme as well as the image associated with it. And Facebook has a hateful memes challenge to detect whether memes are hateful or even just hateful.

accurate or whether there is given a question, let's say, is this meme mean, or is it nice? And it says, love the way you smell today. But if it's a picture of a rose, that's fine. If it's a picture of a skunk, then that's less fine. You know, it's sarcastic. So I think detecting multimodality sarcasm is really challenging and not something I've thought about until, uh,

Until this challenge has been put up, of course, I've seen memes that seemed, you know, obviously slightly inappropriate, but indefinitely with sarcasm. In fact, a lot of them have that. So I would say it's memes and also just images in general as well and images with some text associated or with a lot of text associated with it presented.

it presents an interesting challenge. Yeah, we were talking a little bit before recording and finding it's very interesting that this hateful memes challenge exists, which I also did not know. And yeah, it seems this is similar a bit to the deepfake challenge Facebook had last year and a way for them to drive more research into the area. So it seems apparently that memes that are, you know, hate speech are a big problem and hopefully AI can play a role in helping with that.

One more thing I found interesting and perhaps unsurprising, but worth noting and something I wasn't entirely aware of before reading this article was that there was a leaked July 2019 report obtained by NBC News that showed that the AI systems of Facebook was better at detecting hate speech targeting white people than at detecting hate speech against minorities and marginalized groups.

So just yet another example of actual real-world AI not working as well for minority and marginalized groups. And another reason that we should really be aware of that as a possibility and as researchers and as practitioners of AI work against these kinds of inequalities.

And on to the second article, too many AI researchers think real world problems are not relevant from technology review. A spicy article that Danielle summarized basically that the AI community thinks that solving real world problems is not worth their time or of limited significance compared to more theoretical problems that are more general, perhaps an ivory tower issues, so to speak. I,

Something I found really interesting about this is I might push back on this article and say that two really big applications that were seen as just applications before are computer vision and natural language processing, NLP. And I think for both of them, they were seen as fringe applied, driven by application kind of things, but they can grow into...

fields that the AI community does care about. I do think that perhaps like AI going out into the world is becoming is going faster than people had imagined. And as a result, that conversion isn't isn't as fast as it could be. But I do think a lot of

could be those ones that do bring up those issues and do drive some of that. And I do think that, like, certainly companies applying AI will see problems that we otherwise wouldn't see, like with Facebook, like with the memes. Like, I would have never thought of that. Yeah, so I think I'm still hopeful more so than this article, I guess. What do you think, Andre? Yeah.

Yeah, I also am a bit skeptical of the point being made here. The article's author is Professor Hannah Kerner, who researches machine learning methods for remote sensing applications in agricultural monitoring and food security. So certainly she is part of the research community, and she actually starts her article by noting that

A review she got for a newer submission that stated that the offers present a solution for an original and a highly motivating problem, but it is an implication and the significance seems limited for machine learning community. And that is used to motivate this discussion of too many researchers thinking real world problems are not relevant. And also saying that there's too much emphasis on making progress on benchmarks. And certainly to some extent, it's true.

But at the same time, I think the article doesn't mention a lot of ways in which applications are done a lot in AI research. For instance, other conferences in Europe, like WACV or robotics conferences such as RSS, do have actually specific kind of tracks for applications and even awards for applications and systems.

So for one, I think it's important to note that and that NeurIPS as a conference is perhaps a bit more theoretically leaning in general and applications are for that conference less of a good fit. And yeah, I would say generally a lot of AI is applications. It's object detection, it's captioning. Maybe it's a small set of agreed upon applications and in general, it's hard to maybe make a push for a new one.

But it's also true that a lot of AI research is applied. So good to be aware of this point and that some papers get reviewed and not accepted because it's not an established problem and a benchmark. And of course, reviewing is flawed in all sorts of ways. But there are reasons to also not fully agree with this, as you say, Sharon. Yeah.

Yeah. Cause I could see the new motivated problem of memes being dangerous of being like a new problem, um, now in AI and that, that could be motivating for us. So I will say some applications that are seen as largely applications right now, um,

are in medicine and in climate change, which I think is mentioned, the latter of which is mentioned in the article. For medicine, actually, a lot of papers are accepted to NeurIPS regarding medical application. So I do think that is there. I think climate is a little bit different and it has been harder and has been being pushed through the NeurIPS workshop channels, which I helped organize, I think, the iClear workshop

climate change workshop and we also have like a NURBS one, but that's where we've been kind of fostering more research in that space. But of course, all the healthcare work I've seen at NURBS motivates it in such a way such that it is still a new kind of architecture that is being put out that could be useful for other things as well, but is largely the problem was found and motivated in healthcare. And I would say that

I think that is the right type of application to really push to NeurIPS and for people there to appreciate actually, um,

And so I, I, yeah, I actually have found that nerves is more welcoming than I expected to be honest. Um, but I, I mean, that being said, pushing on leaderboards is huge problem. Um, though I will say that it's also important to have some of those things because then people have clear goals. It's just, we need to reevaluate our goals every now and then. Yeah. Yeah.

Maybe, I mean, the article is titled "Too Many AI Researchers Think", so I guess the question is how many is too many? And it is fair to say that maybe we should have more venues for maybe more pure application research, where there's no new architecture and no new algorithm. It's just solving a real world problem and showing that it can be solved using known techniques.

Certainly in a lot of conferences, that kind of paper where you're using something known and just applying it to a new thing is harder to get accepted and get appreciated. But I do think this article could have been more nuanced and made those points while addressing kind of our feedback, I suppose. And on to our next article. This one is...

pretty concerning and actually kind of weird and was surprising to me. So this article is NYPD used facial recognition technology in siege of Black Lives Matter activist apartment from gofamist.com. And yeah, so the high level story is the NYPD apparently used facial recognition tech to justify this raid.

And the article actually said that an officer can be seen outside in the department holding an informational lead report from the NYPD's facial identification section. And the NYPD actually did confirm that they used facial recognition in this case somehow, but it's quite vague in what way and to what extent. So pretty weird article to me. I don't know exactly what to make from this, except that

seems like we should get more details. What do you make of it, Sharon? Yeah, I think it's very unfortunate for Derek Ingram. But I do question where the facial recognition part is coming into play. Is it largely that police officers are trying to justify, you know, because of brutality against police, like our...

should they have facial recognition technology, like more of it or because of it it's, um, or, or, or rather should they not have it because this incident happened? It's, it's not clear from the article exactly. Um, because I don't actually think that facial recognition technology was really used for, maybe it did help support something, but it, it,

they knew who he was, I think. I've even heard his name before. So, like, I think the police knew who he was already and could probably easily find his address. So I don't know exactly where that plays in. I will say that this might be a testament to people like drawing hype around him

facial recognition technology and police brutality. So it could be just trying to put everything together there for a great media article. But I would say unsure as well. Yeah, it's unclear. And the article, I think, mentions that Dorothy Walden, an attorney for Ingram, is

Actually, it's quoted as saying, "We look forward to uncovering what role, if any, this problematic ideology played in the officers' decision to, without a warrant, raid the apartment." So it seems like maybe there was a vague connection that spawned a bigger conversation about this topic.

Apparently, Mayor de Blasio did sign a police oversight of surveillance technology bill, which would force the department to disclose more information about its surveillance capabilities. So perhaps this also indicates that we really should have more clear kind of explanations of how this technology is used so we know if it's being used reasonably or if it is being used to justify crime.

I don't know, raids that shouldn't really happen. Right. Well, on a better note, our last article is titled A-Level in GCSE Results in England to be Based on Teacher Assessments in U-Turn. And this is in The Guardian. And again, this article is basically touching on the outrage that ensued by

a bit ago around how the UK was going to essentially use an algorithm to assign people exam scores based on their historical results as opposed to actually being able to take those exams. And something like 40% of predicted results were downgraded. And this means that if students were marked down like two or three grades,

this could mean losing university places. And that is awful, especially if you didn't do anything or it wasn't even it didn't even conform to what you would have gotten on it. So I think that that that would have been absolutely awful and would have increased the socioeconomic divide in the U.K.,

Yeah, and actually this article notes that 40% of predicted results were downgraded and a huge public outcry did follow. There were many tragic stories and we were bewildered by this last week of why they would do this. Why entrust this huge decision to an algorithm that is obviously not going to be perfect?

Why even roll it out in this way? That clearly would lead to many people being disappointed and questioning it.

And so it's good that they reversed the decision, but it's still really bewildering this whole episode. I mean, it's still not perfect, right? So now they're still not going to take the exam because that risks COVID. But teachers will assess based on, I suppose, students' past performance on what they should do next and what they're qualified for. And I think, okay,

Okay, so now we have a human algorithm that will be in several, sorry, very diverse human algorithms that will be assigning this type of thing. And I could see a lot of teacher bribery happening. Yeah, so I mean, there are pros and cons to both, I guess. And it's not perfect, but we'll see because I think things like this might...

might be more prevalent as we move forward. And it does make me grateful for the quote unquote holistic review of people's abilities in the U.S.,

Yeah, the article does note the counterpoint by the head of the Department of Education that the argument for the algorithm was that it would ensure results that are standardized across the country and consistent to prior years. So there is a case to be made for it, but it seemed maybe they should have still considered

having option B for people. And I guess the public outcry seemed inevitable for such a huge outcome to be decided purely by an algorithm and without recourse. Perhaps a good example of how AI can be leveraged at large scale and in future attempts at this sort of thing, it can be done better.

Yeah, I definitely think this could be done better in the sense that maybe there could be a model in the future where everyone kind of agrees upon it and it only looks at certain historical data and you actually don't have this exam. I think having the algorithm actually predict your exam scores is a little bit difficult.

I don't know, something feels really off about that because it's you're like, I didn't take this exam. Stop giving me a score. So I think there's also an issue with the framing of it. Right. So, yeah, I guess good news for students whose grades get downgraded, hopefully, and

Ultimately, maybe good news. I think most students deserve to get better grades just for making it through the year and because they couldn't prove themselves on the A-level. Maybe they should get the benefit of the doubt instead of being forced into different bins and so on. Right. Yeah.

And with that, thank you so much for listening to this week's episode of Skynet Today's Let's Talk AI podcast. You can find the articles we discussed here today and subscribe to our weekly newsletter with similar ones at skynettoday.com. Please do fill out that survey on how much you like this format at bit.ly slash LTA survey. Subscribe to us wherever you get your podcasts. And don't forget to leave us a rating if you like the show.

Be sure to tune in next week.