We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

AI Fails to Diagnose COVID-19, Difficulties with AI Regulation, and more on Surveillance

2020/5/2

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Sharon Zhou

Topics

Andrey Kurenkov和Sharon Zhou：许多用于诊断和预测COVID-19感染的AI模型存在偏差，数据质量差，且缺乏对照组，因此不适合临床使用。这些模型的缺陷包括使用不能代表病毒感染人群的患者数据、使用注释不良的数据以及缺乏与已建立的机器学习模型的基准测试。此外，这些模型仅预测已死亡或康复患者的结果，而没有预测仍有症状的患者的结果。同行评审期刊能够有效避免AI诊断模型中许多缺陷的出现，而许多快速发布的模型存在这些缺陷。高质量的同行评审研究，例如发表在《细胞》杂志上的研究，提供了更可靠的COVID-19诊断模型，但其发布需要时间。为了改进医学诊断预测模型，应该采用Tripod清单等最佳实践检查清单，以避免数据偏差等问题。 Sharon Zhou：当前的AI模型，特别是监督学习模型，需要明确定义的真实值才能有效运作，而法律和伦理判断的真实值往往难以定义和量化。由于法律具有情境性和灵活性，AI无法自动化公平性评估。法律中许多基本概念，例如弱势群体和优势群体的构成、遭受损害的严重性和类型以及证据的相关性和可采性要求，都需要司法机关在个案基础上做出规范性或政治性选择，这无法被AI自动化。以色列拥有大量的AI公司和新技术研发，这在一定程度上是由于其在AI领域的大量投资和学术合作。以色列在AI领域的产业活动非常活跃，这值得关注，因为它与西雅图的联系可能会影响美国AI的发展。

Deep Dive

Chapters

The discussion focuses on a paper criticizing AI models for COVID-19 diagnosis, highlighting biases and flaws in data usage, and questioning the rush to deploy untested models.

Shownotes Transcript

Translations:

中文

Hello and welcome to Skynet today's Let's Talk AI podcast where you can hear from AI researchers about what's actually going on with AI and what is just clickbait headlines. This week we'll look at the state of AI for diagnosing COVID-19 cases and then we'll discuss some recent work on AI fairness and the connections between multiple academic and industry groups in Seattle with Israel's AI scene.

I am Andrey Karenkov, a third-year PhD student at the Stanford Vision and Learning Lab. I focus mostly on learning algorithms for robotic manipulation. And with me is my co-host. I'm Sharon, a third-year PhD student in the Machine Learning group here at Stanford, working with Andrew Ng. I do research on generative models, improving generalization of neural networks, and applying machine learning to tackling the climate crisis.

And Sharon, I gather you've had a bit of a busy week so far because right now the ICLR 2020 conference is going on, the Conference on Learning Representations, which is a big AI conference and for the first time is fully online, fully virtual.

So how has that been going? How have you enjoyed attending it as a fully virtual conference? It's been really, really cool, actually. I really enjoy the fact that they put up a lot of the talks up online beforehand and so you can watch it at your own leisure. I also am co-organizing a workshop on climate change using AI and we actually have it as a five-day workshop

and having each day being focused towards some focus area for climate change. And so that's been very exciting, but also fairly exhausting, actually catering to all the time zones out there and coordinating all sorts of Zoom calls. So doing things in real time, I think, has been quite challenging. But

Actually, one cool benefit from all of this is that the panelists, when we do have panels, start to be more comfortable and talk amongst themselves a bit more than in an in-person setting when there's an audience before them. And I think people get a little bit nervous there. So I think panelists have been less nervous and started to talk amongst themselves. And it's been fantastic to see that.

Interesting. So have you been spending even more of your time on Zoom and various video calls than in prior weeks? More than I would like. Zoom is, as someone said, they are zoomed out by now.

but I prefer zoom over all the other possible platforms though. Slack's platform actually is quite, quite nice since I feel like now the zoom etiquette is no longer needing a video, your video to be on, uh, since people are probably descending more and more into pajama wear. Uh, so I think that, uh,

These Slack calls have actually been really nice when doing work because Slack actually has this cool ability where you can draw on the person's screen who is sharing their screen. So that's been pretty cool. That's good to know. Actually, I didn't know that. So enough about iClear. Please check out the videos online and learn more about the cool research being done at iClear this year.

Now on to our topic of COVID-19 with AI. We will dive into a paper titled Prediction Models for Diagnosis and Prognosis of COVID-19 Infection, Systematic Review and Critical Appraisal.

So while many researchers have rushed to help with combating the COVID-19 crisis using AI tools, this paper has shown that dozens of those models are actually very biased. They do not consider patient data that is representative of populations infected by the virus. They use poorly annotated data. They don't benchmark against established machine learning models.

They also found that these models only predicted patient outcomes on those who have either died or recovered from the virus, but not those who remain symptomatic, so only a subset of those with COVID-19. So basically, there has been quite a bit of criticism, and this paper, this systematic review

of all of these models to get an understanding of where their faults lie. So what's really interesting is that in a commentary that accompanied this paper in the British Medical Journal, the editors actually said the models were so uniformly poor, quote unquote, that they, that none of them can be recommended for clinical use. Yeah. So pretty, pretty interesting.

Interesting results from the survey that really showcase that as we rush to try and help with AI and develop models for diagnosis, there's a lot of ways to get things wrong and it can actually do more harm than good to try and rush it out.

So we've talked a lot about how various groups are trying to develop models, but this review is quite interesting for actually evaluating them more closely and showing all of these flaws that seem pretty common. I wonder, Sharon, as someone sort of aware of AI medicine more than I am,

Are these kinds of flaws in AI research for diagnosis from images kind of something you've seen before? Have there been other surveys you've seen before evaluating the state of art? I haven't seen systematic surveys so much as papers when they are peer reviewed have significantly fewer of these flaws, though these flaws do exist. And hopefully people actually do present these as limitations to their work.

But very often, a peer-reviewed journal would prevent a lot of these flaws from going through, from being actually published, as opposed to, I think most of these papers were probably published in non-peer-reviewed avenues since they wanted to get something out very, very quickly to help with the situation. There's a recent article in the journal Cell from

Chinese group using 6,000 chest CTs, and these have been released to the public with about 2,000 positive COVID cases, 2,000 that are other pneumonia cases, so that could be mixed up with COVID and that we need to make sure that we can differentiate well, and about 2,000 that are normal, so those who don't have any of these pathologies.

So this is just coming out now, and this is clearly peer-reviewed as it's being published in Cell. But it's obviously taking time to go through peer review, and it takes time to mitigate against these flaws and to get reviewer feedback. Hey, you need to actually make sure the distribution you are training on is representative of the

of your test distribution, for example, or what you're claiming in your paper. Yeah, so as you say, this is a good kind of example of why peer review is quite important, why we need research to sometimes be slow. Actually, to be a little more specific, this survey looked at papers from PubMed, InBase,

which included papers on Ovid, BioArchive, MedArchive, and Archive, which are pre-print websites. So these are places you can publish research without necessarily having it be peer-reviewed. And the survey includes papers that were released between early January and late March.

So on the one hand, I guess it's not too surprising or flawed because researchers were working with data they could have at the time and they were releasing it on sites like archive to get it out as soon as possible.

On the other hand, I guess it's interesting to question whether it's even useful to rush things out if ultimately it turns out that they have these critical flaws and they cannot be operationalized. Yes, it's always this tension between speed and accuracy. Ideally, we could have both, but sometimes doing speed means maybe neglecting certain things and also having less data available in the beginning.

I should hope that these papers do state that they cannot be for clinical use, but that they are hopefully pushing some kind of result that might be helpful for other researchers moving forward. That's my hope, but I'm not sure. Yeah, at least this can serve as kind of an example and...

moment to reflect and to update how we do things. This particular survey also has a recommendation that machine learning researchers should adopt what is known as a tripod checklist, which is a 22 point checklist that is meant to improve prediction models for diagnosis and medicine.

So essentially it is kind of a best practices check list to make sure that you don't have bias in your data and things like that developed by physicians and data scientists. So now that we have this result showing that many who rushed to get models out

right away, made some crucial mistakes, maybe the set of best practices can be changed so that by default, people avoid these mistakes. Although, of course, peer review also needs to be there to catch it.

Yes, definitely. I'm looking at the set of checklist items in Tripod and some of them are a bit, I would say, broad and it's hard to say exactly whether it would help. But I imagine that some of these would actually help as you go through, you realize, oh, I actually didn't think about this.

I didn't actually think about this. So, for example, one of them is clearly define all predictors used in developing the multivariable prediction model, including how and when they were measured. And perhaps you forgot, oh, when they were measured, you know, oh, I didn't put that in. So I think...

that this framework could help with jogging people's memories as they add these details into the paper, as well as hopefully thinking through the limitations as they think through this checklist. I think in core AI machine learning research, we have some of those checklists in place for, for example, submitting papers to NeurIPS. You have to go through a checklist of various things about your model and whether or not you're going to release your code.

So, and whether or not you included, for example, the number of runs you did for something and like the standard deviations and stuff like that. So I think like that, that does exist in the core ML community. And it's really interesting to see one or something applied here.

Yeah. Yeah. So I guess on the whole, this is yet another example that AI isn't some sort of super advanced thing that just always works. We need to be careful as practitioners. And it's not as if, you know, AI is human level and can automatically avoid these mistakes. And really, you need to be careful of the data. But to move on to another topic, we've had enough COVID talk yet again.

Next, we're going to talk about this article from a medium covering an interview or a discussion by Sandra Watscher on why fairness in the EU cannot be automated. So Sandra Watscher is a faculty associate at the Berkman Clyde Center, a visiting professor at Harvard Law School.

and an associate professor and senior research fellow on law and ethics and AI. So essentially a person who has a lot of credentials in this conversation on ethics and law and AI. And so the Berkman Client Center actually released an interview with this professor on her thoughts about AI regulation and why fairness cannot be automated. And the short version, which is quite interesting, is that

The law, it turns out, is written to be pretty contextual and pretty flexible. So one quote is that the court of justice does not stick to consistent methods or metrics for assessing non-discrimination cases. In fact, the case law in European legislation embraces what we call contextual equality. Laws and case law are purposely agile and fluid to offer appropriate legal responses in a constantly changing society.

So coming from someone with some expertise in this, it seems like a very interesting thing to point out. And so I wonder, Sharon, have you seen things like this, notions about this and why fairness metrics cannot be automated? Because it's definitely new to me.

I definitely think that what she is stating here is very valid in the sense that AI right now, especially supervised learning, requires a pretty well-defined ground truth to do well.

And even when that ground truth is slightly shaken, we have to take all these different turns to try to make it work. For example, in medicine, one doctor actually cannot give the ground truth oftentimes because doctors disagree with each other. So we take perhaps assessments from three different doctors, eight different doctors, 10 different doctors for the same x-ray, for example.

and get their sense of what is the diagnosis here. Of course, that's, I know that sounds a little bit worrisome, but that is actually the reality of how we interpret things. And this is interpreting something that does actually have an underlying ground truth for science, right? Like this person does have some kind of pathology, for example, like pneumonia. But

But we would have to go through doctors and they disagree. And even that, we have to consider, hey, do we take the average of what the doctors say? Do we take the mode? Do we weight the doctors based on their expertise? How do we really combine this to supervise the AI algorithm here? Here, what Watchter is saying is that we can't do that when...

Things are highly contextual and require in that context just a large amount of information, so large that it is hard to encode that to the AI from a human expertise. And that it relies on intuition and all these sorts of things that are hard to encode and then thus hard to give a ground truth.

especially as these systems evolve over time. And this definitely makes me think about one talk that I actually watched today at iClear from Jan LeCun, who is the recent Turing Award winner for his work in AI, foundational AI. And he basically said that the way neural networks, deep learning, really aimed out now, these systems are very deterministic systems.

But then he believes that the future is self-supervised. And we talked about self-supervision a bit in previous episodes, but essentially it's the model learning from itself and learning from various aspects of the data it's been given already by itself alone.

And creating models like latent variable models that are much more dynamic and are not as deterministic. And so what Sandra Wachter is saying here really reminds me of that in terms of this is the direction, hopefully, that AI will take such that we can potentially start to aid in some of these choices, maybe, unless we think as humans, we never want this thing to be automated. Yeah.

Yeah, yeah. So to that point of there being a lot of data to consider in making a decision, one of the points raised in this interview to quote is that many of the concepts fundamental to bringing a claim, such as the composition of a disadvantaged and advantaged group with severity and type of harm suffered and requirements for the relevance and admissibility of evidence require normative or political choices to be made by the judiciary on a case-by-case basis.

So even the evidence admitted needs some human review. So it also ties in to the medical context in that the human professional needs to interact and work with the AI system when making a decision about data and its output. And it cannot just be fully automated.

I also, Wachter, in this interview is making kind of two points. One, that the regulations, the law itself, needs to be updated a little bit to be less contextual, to be able to allow for some regulation, but also that we need flexibility and we need to still be able to take things case by case to some extent and not take it too far. Yeah, I do have one last. Okay, so...

it's great that we have experts like this thinking about these topics. It definitely reminds me of what a friend of mine who does have a law degree, a

was telling me about that when he does approach people and first talk about what AI could do for law, people get really nervous and think of all sorts of these biases. But of course, when he starts talking to someone and he begins his conversation with how biased humans are, then he transitions to how AI can potentially mitigate some of that bias, then people are very receptive.

to integrating AI. So it also is how we frame what AI is going to do and what AI's role is. Um, and it will be, I think, challenging as washer points out in comparing how, like what a human does versus what an AI does and which one is more biased. I think, I think that'll be very, very, very difficult to quantify. Um, but enough about, uh,

Ethics and the meta discussion around that, perhaps diving deeper into surveillance. So what is interesting here is that there is a Seattle Times article called A Tale of Two AI Cities, The Seattle Connection to Israel Surveillance Network.

So what's really interesting here is that while you might be aware that countries such as Israel are deploying AI for doing surveillance for their citizens to curb the spread of coronavirus, you might not be so aware that Seattle and Israel's AI ecosystems actually have very close ties as a result of Microsoft and Amazon acquiring several Israeli startups online.

and having strong academic connections with Seattle AI-based Institute AI2. For example, the article says, quote, to get into Israel from Palestine, facial recognition systems installed at Qalandia and 26 other checkpoints last summer have drawn the ire of human rights advocates.

The facial scanners were developed by Israeli artificial intelligence security startup AnyVision, which has ties with Redmond-based Microsoft. Microsoft's venture capital fund, M12, came under fire for participating in a $74 million investment in the AI security company last June. AnyVision has not responded to repeated requests. So basically, Seattle is a home to a

number of academic and industry groups, which have very, very close ties with the Israeli AI ecosystems. And this seems to be spurring the adoption of AI surveillance technology worldwide. And not surprisingly, this is causing controversy. So while these AI systems for surveillance are being deployed in Israel, they're not being deployed in the US, but they have these very, very close ties.

Yeah, so we've talked a bunch about Clearview and how it's developing facial recognition tech in the US. And I suppose this article points out that there are additional companies in Israel working on this. So they've developed these checkpoint scanners and there's some worry that this will lead to larger scale surveillance. And although...

This is just saying that there are ties, there are connections between these large companies like Microsoft and these firms in Israel. I suppose it is interesting to note that there is this kind of network and presumably having the ties does lead to influence and that if there is more development of surveillance in Israel, it might make it easier to spread in the U.S. as well.

I wonder, Sharon, were you aware that Israel has such a large number of AI companies and is developing so much new tech? I actually did mainly because of ICML last year. Oh, no, sorry. It was not ICML.

Okay, redo. I actually did because of a conference called UAI Uncertainty in AI last year that I attended in Israel, in Tel Aviv.

And that's when I learned about all of these Israeli startups and AI research being done there. And there's quite a bit of AI research being done in Israel. And I think I, every now and then, am surprised that finding a company on Crunchbase that is based in Israel. And I think I've been seeing more and more of that. But I think that trip really surprised.

oh, this is a big hub for AI. And they had actually requested that

Andrew Ng, my advisors, speak at their large conference, their nationwide conference about AI. And I just didn't realize the extent to which they were an AI hub. Yeah, the extent is also new to me. Although, fun fact, I used to live in Israel. I was there for seven years in elementary school, and I still have family there.

So I was aware that Tel Aviv is a tech hub. There's a lot of software engineers and startups in Tel Aviv. But this article notes that in 2018, AI companies raised nearly 40% of all venture capital investments in the nation of Israel, according to a Startup Nation Central report. So it is pretty interesting to see that there is such a large amount of

industry activity in Israel. And because there's also ties to Seattle, it's worth kind of keeping an eye on it and being aware of how dividable elements there might influence what's happening here.

One interesting fact point out by the article was that last fall, Microsoft actually hired the former U.S. Attorney General Eric Holder to audit any vision in Israeli AI security startup that we mentioned that the venture capital fund in Microsoft gave $74 million for. And they were

They did this because they were concerned about the company's monitoring of Palestinians and found that it, quote, does not currently power a mass surveillance program in the West Bank that has been alleged in media reports. So Microsoft did care about this very much. And this is according to a March 27th statement. But still, Microsoft's M2 declared that it would divest its shares in any vision because it can't exercise oversight or control of the technology.

which I suppose is worth noting in the sense that if Microsoft or Amazon were investing in surveillance tech in Israel and its development, there's already been a lot of scrutiny on what companies are doing here in the U.S. So we should, of course, be mindful that AI is international now and these companies are international and they can, you know,

they can influence the development of AI in other ways.

And another thing this article notes is that along with the strong industry ties, there's also a lot of academic collaborations of AI research, for instance, between the University of Washington and the Israeli institutions, Bar, Ilan University and Tel Aviv University. So those are also hubs of research. So in general, there is this relationship that I suppose just is interesting to note that

AI research and AI development is pretty international, and now ties are becoming closer between different large cities.

And with that, thank you so much for listening to this week's episode of Skynet Today's Let's Talk AI podcast. You can find the articles we discussed here today and subscribe to our weekly newsletter with similar ones at skynettoday.com. Subscribe to us wherever you get your podcasts and don't forget to leave us a rating if you like the show. Be sure to tune in next week.

AI Fails to Diagnose COVID-19, Difficulties with AI Regulation, and more on Surveillance 27:01 Share

Last Week in AI

Deep Dive

Shownotes Transcript

AI Fails to Diagnose COVID-19, Difficulties with AI Regulation, and more on Surveillance