We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

GPT-3, Limits of Deep Learning, Deepfakes in the Real World

2020/7/26

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Andrey Kurenkov

Sharon Zhou

Topics

Sharon Zhou：AI在医疗决策中的应用缺乏透明度，尤其是在可解释性方面，这增加了患者的担忧和对AI的不信任。目前AI模型存在偏差，缺乏透明度，以及对其性能和测试的了解不足，这使得在医疗领域使用AI系统存在诸多挑战。为了增强公众对AI系统的信任，需要对AI模型进行审计，并公开其性能指标和可靠性信息。医生对患者的决策解释也存在不透明性，这与AI系统在医疗决策中的不透明性类似，需要探讨最低限度的信息披露标准。AI系统在医疗领域的应用需要制定明确的伦理规范和信息披露标准，以确保患者的知情权和医疗安全。 Andrey Kurenkov：医生可能会过度依赖AI辅助决策工具，这可能会影响医生的独立判断和专业性。评估深度学习进步趋势的论文中，一些常用的基准测试（如ImageNet top one error）可能已经接近饱和，这使得评估深度学习的进步变得复杂。深度学习的进步正日益依赖于计算能力的提升，这带来了巨大的环境影响，需要开发更高效的深度学习方法。研究表明，深度学习的进步正呈现边际效益递减的趋势，获得微小的性能提升需要指数级增加计算量。深度学习领域还有许多未充分探索的领域，这些领域可能不需要大量的计算资源就能取得显著的进展。对深度学习进展的评估不应仅仅依赖于特定基准测试的准确性，还应考虑其他因素，例如数据效率和计算效率。认为深度学习已接近计算极限的说法夸大了其现状，深度学习领域仍有许多有待探索的方向。

Deep Dive

Chapters

Discusses the ethical dilemma of whether patients should be informed about AI's role in their healthcare decisions, highlighting issues of transparency, trust, and potential over-reliance on AI by healthcare providers.

Shownotes Transcript

Translations:

中文

Hello and welcome to Skynet Today's Let's Talk AI podcast, where you can hear from AI researchers about what's actually going on with AI and what is just clickbait headlines. I'm Sharon, a third-year PhD student in the machine learning group working with Andrew Ng. I do research on generative models, improving generalization of neural networks, and applying machine learning to tackling the climate crisis. And with me is my co-host...

Hi there, I'm Andrey Krenkov, a third year PhD student at the Stanford Division and Learning Lab. I focus mostly on learning algorithms for body manipulation in my research.

And Sharon, I think this week we have a lot of quite varied news stories, so it should be pretty interesting to chat about it all. And we'll just go ahead and dive straight in with the first one here, which is titled, An Invisible Hand, Patients Aren't Being Told About AI Systems Advising Their Care. And this was from statnews.com.

So the short version is machines that are completely invisible to patients are increasingly guiding decision making in the clinic. And this article summarizes how tens of thousands of patients in Minnesota's largest health care systems have had discharge planning decisions made with help from an AI model, but they do not know about it, about the AI model.

And apparently doctors and nurses have made a point about not mentioning to patients because the patients might worry and distrust the AI. And of course, the doctors and nurses don't want that because presumably they do trust these models.

The healthcare workers do emphasize that to find decisions made by the humans and the AI is just a tool that helps. But of course, there are still issues of whether using these decision-making aids counts as research if it's too exploratory, not developed enough.

And it's unclear if the patients deserve to know about it, if they would be worried about it, or if it's okay to not let them know. I guess, yeah, what's your take, Sharon? It seems like a pretty tricky area.

I think it is a very tricky area. It harks back to issues with AI and explainability. So explainability, meaning the AI can kind of explain how it reached its decision and give that to a human such that we can then trust the AI more. And these systems, it sounds like, don't have much of that or don't make that part as transparent. And it's a very difficult problem. Is that you or me? Yeah.

Damn it. Okay. And it's a very difficult problem. I would say the other side of this is also, uh, over-reliance on AI decision support tools of, from, from the user. And the user here is, I would say the doctor, uh, and there, this view was somewhat expressed in the article where, uh, three patients said they wouldn't want to know if their doctor was being advised by such a tool. Uh,

And a fourth patient spoke out forcefully in favor of disclosure. I find that quite interesting that many folks would not want to know if their doctor is actually being advised by a tool like this, that they just don't find that appropriate. But I also find it kind of concerning that...

Doctors would probably over rely on a system like this. I have seen that play out in experiments we've run where we have an AI model say something's positive and then the doctor says, you know, I'm probably slightly more certain that it's positive because I see this AI system saying it's positive. But I haven't seen it flip a doctor's decision completely. So that is slightly better, but I can see it happening.

Yeah, I think those are a lot of good points. And I think at the current stage of AI development, the justification that the healthcare workers are using that, you know, the patients might find out about it through news anyway, and bringing it up just has the potential to become an unnecessary destruction and undermine trust in ways that they're trying to avoid, right?

But at the same time, we know that some AI models have biases even deployed in production. We know that we don't have many tools right now to really understand models well. And in general, I think we've seen that a lot of times there isn't a lot of transparency about how well these models perform, how well they're tested, etc.,

And this reminds me a lot of discussions we had about, for instance, auditing, where companies would have to publish the metrics of the models they're using and how reliable they are and so on to then let people know what they should trust or how worried they should be about the approach this company is using.

So in general, I think, as you said, complex topic, but I'm a little skeptical of this decision to not disclose it. It seems like in any way and just use the AI tools anyway. Right, right. Absolutely. And I imagine there might be a host of reasons for not disclosing it that, you know, doctors might be concerned about. But I think we need to

Think about as we as we shape regulation, how to how to maintain that transparency, what kind of disclosure should happen, what kind of disclosure makes sense. Of course, doctors also also withhold information.

some sort of disclosure themselves when they do give, when they do apply a diagnosis. And I have seen doctors change their diagnoses when, when another doctor is around, for example, you know, like it is influenced by the environment. So, so I can imagine that there might be a host of factors there too, but I really hope this does play into the discussion of, of regulation. Yeah.

Yeah, exactly. Related to that, a common kind of response to the topic of explainability in medicine is that, you know, doctors don't fully explain the rationale for their decision making to patients, right? You're supposed to just kind of trust your doctor as an expert. So in some sense, there's already some opaqueness as to why decisions are being made.

But at the same time, I think we have these processes of medical school and training and various oaths doctors have to take to presumably train them to act ethically and to inform patients of any necessary information, which is not so much true of AI. So we need to figure out what is a minimum kind of required disclosure. And as you say, that may involve regulation ultimately.

And of course, if these systems were absolutely perfect and absolutely better than all doctors, then we should probably trust them, right? So if they become that good, then we trust them. But of course, they're not at that level yet. And in fact, they may not be, which brings us to our next article from VentureBeat. MIT researchers warn that deep learning is approaching its computational limits.

And so the summary here is that historically progress in deep learning has been fairly reliant on advances in computation, so compute. And so newer methods like neural architecture search, NAS, are extremely compute hungry and they have a huge environmental impact that is starting to garner attention, especially in the NLP, natural language processing space.

And so a research team from MIT, the MIT-IBM Watson AI Lab, Underwood International College, University of Brasilia, all came together and asserted that continued progress will require actually dramatically more efficient deep learning methods. And these are either new methods or more efficient versions of existing methods. You can imagine a distilling model to be smaller and making it more efficient. And the researchers analyzed about efficiency

1,000 papers from Archive to analyze that connection between deep learning performance and its computation. And when they analyzed these 1,058 papers from Archive, which serves preprints as well as other benchmark sources, they found that it takes...

exponentially more compute to get incremental improvements at this point in time, which is very concerning. Andre, what do you think?

Yeah, this is an interesting paper. I think it's cool that we are getting more and more of these sort of empirical papers evaluating with many, a lot of data, kind of the trends that you're seeing. I think one caveat to be noted here is that some of the examples, for instance, are with very, very popular benchmarks. So one of them is the ImageNet top one error.

And they do show that, you know, they have a graph showing that there is kind of exponentially more compute to get small improvements on the top one error. But then again, the top one error is at this point relatively low and maybe saturating to a point where, you know, there's not much higher we can get.

So a good caveat to note is that there's a lot of tasks in deep learning that are maybe less explored right now and that require more architectural innovations or conceptual innovations. And that's a lot of progress is being made it without huge compute. Some examples being, let's say 3d computer vision is a pretty young area. I think in reinforcement learning, there's a lot of algorithmic work. So again,

I would say this paper is well done for a particular type of research, but that type of research is not all of deep learning and maybe even not the most exciting or interesting part of deep learning. What do you think, Sharon?

I think that's true. And also, top one error in ImageNet, if you've actually looked through ImageNet images, doesn't always make that much sense. Because I think the human classification agreement would be

potentially worse than what the models are doing now. And it doesn't make much sense if you actually go look at the images. Some of them look like they have multiple classes. It doesn't even look like the primary class they're tagged as is the actual class that it should be primary. So like an example would be that I've definitely seen images of lemons with also apples and oranges in there, but it says, no, this is a lemon. So, you know, so top one means it's

first classification is the correct class, you can't just say that image must be just lemons. It also has oranges and apples. So I would say that that happens quite frequently in ImageNet. So even the benchmark itself might suggest some issues where calculation is really just trying to squeeze out some kind of overfitting onto that data set, I believe.

Yeah, definitely. And I think that holds true for all the other examples they have here of deep learning tasks. So they also do things like machine translation and object detection, which are very thoroughly studied problems and data sets. And so at this point,

When you apply a new architecture, you wouldn't expect, let's say, a huge change. So it makes sense that you need a very drastic change to get relatively small improvements. But it's important to be mindful that there's a lot more to research than just improving these particular metrics. And I think we've also discussed before that even just

Accuracy itself, like performance, is not the only goal for research. We are also researching things like learning of less data, being more compute efficient, all these other things. So the way you measure progress, right, need not be just with accuracy on these particular benchmarks.

which I think does lead to the article title of MIT researchers warn that deep learning is approaching its computational limits is a bit much. I mean, computational limits imply a lot of things, and it's a bit overstating the case. Although in some narrow sense, that's true.

There was actually another article titled "Prepare for Artificial Intelligence to Produce Less Wizardry" by WIRE.com, which is even more misleading, right? Because we are already producing this wizardry quite well, and we are exploring ever more avenues to do interesting things in. So, yeah, especially that latter article, I would say,

Not quite what the paper is saying. It's still very exciting what you're doing in AI research, I suppose. Right, right. So we definitely need to think through these caveats beforehand. And speaking of think-through caveats, perhaps even more than think-through, our next article is from Reuters. German court bans Tesla ad statements related to autonomous driving.

So a German court actually banned Tesla from repeating their quote unquote misleading statements about autonomous driving and driver assistance systems, essentially autopilot. So criticism of Tesla's autopilot system has been fairly rampant among consumers. And of course, in Germany, the worry is that these claims actually cause drivers to use much less caution with vehicles.

Elon Musk said this month that Tesla is close to quote unquote level five autonomy, which means that it's fully autonomous. And so that means driving without any need for any passenger input. And that, of course, is fairly far from the truth as it stands now. I love Elon, but yeah. What do you think, Andre? I feel like this was this was coming. Yeah.

I think this is an interesting case. I think Tesla has been getting a lot of flack actually for a long time for how it's marketed its autopilot and its claims about autonomous driving. And in particular, the courts mentioned that the ads included statements like that the cars have full potential for autonomous driving and autopilot inclusive claims.

And that really, I think, does give you the idea that these cars are close to just driving themselves without your input, which disregarding Musk's claims on Twitter is all signs point to still very far from the truth. And Musk has been saying similar things for many years of we'll have full self-driving very soon.

So I think it's past time that there was some regulatory kind of scrutiny on this misleading claims. It's important for people to not be misled in advertising, for trust to still be there in AI systems. And yeah, I think it's maybe the first time I've seen this and I'd be curious to see if this happens more in the future.

I agree. And I think false advertising arguably falls under various consumer protection laws. And I think this definitely concerns safety.

And I previously thought about this before with a friend who's a lawyer about this. And he's very, very concerned about this, especially with Tesla. And what's really interesting is that this completely makes sense. And it's kind of interesting because actually we don't know whose job it is to be regulating this, essentially. Like, is it Nitsa? Is it essentially the...

the people who are regulating, you know, autonomous driving or is it the consumer protection agencies? You know, who is it who should be regulating this? And I think that actually makes it very difficult. And it's quite refreshing to see Germany actually do this. I think this makes sense. I heard about a story a while back, actually, where someone was driving on the Autobahn and thought their Tesla was completely autonomous at the time. This was probably a year or two back and just went to the back seat

and took a nap and well died from a crash. So because it's not completely autonomous and that's, that's really frightening. Yeah. Yeah, definitely. If you look actually, there's a Wikipedia page for fatalities from self-driving cars that exist. And there's a table listing all the currently known kind of historical incidents that

And a lot of them, I think the majority of them are Tesla incidents from similar cases. There's been, I think, around half a dozen at this point of various Teslas crashing while their drivers are letting the autopilot drive. So that just compounds this issue of claims being made while there have been these fatalities that actually made the system look pretty questionable in terms of how they got into those accidents.

So, yeah, I guess it's good that we are seeing the scrutiny, I suppose. And speaking of things needing scrutiny, we're going to move on to our next article, which is quite intriguing from Reuters and it's titled, De-fake used to attack activist couple shows new disinformation frontier. So the short version is there was a fictional person named Oliver Taylor,

who accused a London academic named Mazin Masri and wife of being known terrorist sympathizers. And after there's been some scrutiny, it's been shown that Taylor's photo is a deep fake. And there was actually a whole elaborate fiction about his background. So there was a university that he was associated with that said there was no record.

He had a minor online footprint, like being on Quora, but he was not very active there. Two newspapers that published this work of his say that they tried and failed to confirm his identity. And experts in deceptive imagery then used state-of-the-art forensic analysis programs to determine that the profile photo was

for these articles was indeed AI-based image. So basically what is known as deepfake.

So we haven't seen too many cases of really such malicious and sort of, you know, even dramatic uses of deepfake so far. There's been a lot of very negative, harmful things with, for instance, deepfake-based porn. But I haven't seen anything kind of as dramatic and direct as...

yeah, I find it's very interesting. How about you, Sharon? Yeah, I haven't seen something of this nature before. And I think what makes this, uh, very concerning is that, uh,

I mean, it's a form of forgery, but we can't, I guess we can't apply those awful facial recognition algorithms to find who this person is. Right. Cause they don't exist. Or if we do, we would find someone who, who didn't do it, which is also bad and probably worse. Uh,

And yeah, so it does come to show that this is starting to take hold. And I hope that the authorities use other means to track this person down, whether it be tracking this person's IP address or something. There are just so many ways to spoof everything now. It's kind of frightening. So it's almost like what is real, right? Yeah.

Exactly. Yeah. And it seems like this is a growing trend now. So last week, the publication of the Daily Beast also revealed that there was a network of deepfake journalists like this submitting articles to spread these ideas online.

And so it seems to be sort of a thing we might have to start contending with on top of all the other problems we have and media and figuring out the truth.

I guess the good news is this was a relatively minor publication and the article didn't get much engagement. So presumably more kind of well-known publications would not have published something by a deepfake journalist. But still, I think this is

quite dramatic and maybe we'll start seeing essays by people on Medium that are by fake people and so on. And so we'll have to be personally aware that this is a real possibility and I guess be mindful of that.

This does make me wonder, you know, what will the... I imagine there will be some kind of new evolution of media now that so much is fake or so much is echo chamber that, you know, what is that next evolution? If any, I do wonder if there's something much better that we could have where the incentives aren't so...

so bad that it causes this spiral out of control in one direction. And so I wonder what that might be. And I wonder if we will devolve into almost like the former society of having just villages, right, where we meet up with people. But that sounds unlikely, too, given this Zoom world of coronavirus. So I'm not sure.

Not sure if you have any thoughts, André. Yeah, it's a really interesting question of like, given how hard it is to tell apart truth and falsehoods and ever increasing disinformation and misinformation, kind of how do we get a handle on all this? And I've seen some different takes that I found interesting. So for instance, I've heard in discussions that

Media organizations like New York Times and other kind of established reputable sources will start being even more important in the sense that they will kind of provide documentation as to where a particular piece of media came from. They will kind of verify it is legitimate.

and not a deepfake. And we'll basically need these signatures, these digital signatures on articles, on images, on videos to really know they're real and not deepfaked, not created with AI. So I guess the hopeful version is we'll manage to put together a system like that where we have methods of verification and any fake things are caught in the system. Of course, this will limit the ability of people to

express themselves and put things out there. But at this point, maybe that's a trade-off we need to make. Have you seen any ideas or approaches or possible futures that have resonated with you, Sharon? Like for dealing with deepfakes, I suppose, in terms of how we can make

make sure it's possible to tell apart false information and facts and yeah, how you get a handle on this big problem. I've seen band-aids. So band-aid solutions for this, which is, you know, having fact checkers and everything, but I'm not sure if that is true.

that exactly solves the situation or, or it definitely helps, but I think not everyone would believe that either. So, um, I, yeah, I, I'm not, I'm not sure.

Yeah, I mean, I guess, yeah, the short version is this is a complicated thing. And the good news is that there are organizations and teams and smart people thinking about it much more than we are and hopefully coming up with maybe more subtle and nuanced and possibly effective ideas that hopefully we'll start seeing more of as deepfakes also become more prominent. Right, absolutely.

And on the topic of deepfakes, well, generative models, which are deepfakes are generative model for images. Recently, there's been a generative model for text that has been gaining quite a bit of attention, and that is called GPT-3. Three stands for version. So an article titled GPT-3, an AI that's eerily good at writing almost anything.

So GPT-3 recently came out from OpenAI and folks have been posting about the powers of GPT-3 as well as its limitations. What have you seen, Andrey, and what has caught your eye? Yeah, this was really big on Twitter lately, as you probably know. Like many, I was taken aback by this early demonstration that maybe started all this hype recently of

There was a demo of using GPT to create code for web layouts of websites. So you input kind of like big red button that says blah, blah, blah, and an emoji next to it in English, and it generated for you the HTML and CSS or whatever.

And the interesting part here is that GPT-3 wasn't trained to do this. It was trained for a very simple task of predicting what words come next, given some amount of previous words. So just kind of out of complete.

And the amazing thing is this autocomplete can be prompted, can be conditioned to do a whole lot of more specific things like producing code if you just give it an example input. So that was one I've seen some other ones like you can have conversations of various kind of characters. You can say, oh, GPT-3, you are Einstein and I'm talking to you and talk to me as if you're Einstein. And it actually does take that role and can talk to you like that.

So it's been definitely quite interesting and cool and also very hyped for a while. How about you, Sharon? What have you seen? I've definitely seen that as well. I thought that was pretty cool that JavaScript, HTML, all of that,

all these languages that we use for programming are in fact languages underneath and they are languages. Like we all know that we call them languages, but GPT-3 really takes that to the next level and says, yeah,

you're a language and I'm going to produce you like a language. And so that's pretty cool. Um, so seeing, seeing GPT three produce templates from natural language is really cool. And I thought the funniest comment I've seen is, uh, and, and this is why I procrastinate on learning how to code. And so I think that's pretty funny. Um, so yeah, that, uh, that is, it's been quite an impressive fee. And I, and I would say that, uh,

I think something else that's kind of interesting and funny is that, you know, before we would craft our models, you know, and then neural architecture search came out and it's okay. Maybe we don't even have to build, we don't even have to design the architecture. We don't have to tune the architecture anymore. And then, so instead of tuning architectures, we tune hyperparameters. And then now we don't even have to tune hyperparameters. We really just have to tune the prompt, you know, and that's, that's kind of true in a sense where you have to tune the prompt because, um,

it's not every time that it produces exactly the output you want in text or another language. And so you have to maybe try 10 times and one out of 10, it sounds, it sounds good. And I know there's also a news, an article on that got pretty, that got pretty upvoted on hacker news about the,

And they reveal halfway through the article that 2PT3 had written that part of that whole article. But it wasn't revealed until midway through. And it sounded like it was an article that was pretty meta. It was about 2PT3 and its limitations. So it was funny because the comment section, you could tell who had read the article or not or how far people had read the article based on whether or not they heard the gotcha article.

Yeah, that was kind of a funny one. Actually, I wonder, did you read that GPT-free generated article? What did you think about it? Not fully, but yeah, I was impressed.

Yeah, I started reading it and I saw in the comments already that it was generated by GPT-3 and it was quite coherent, unlike a lot of previous similar work. So it kind of made sense. It wasn't nonsensical until many paragraphs in. Although I would say I was pretty critical of its writing skills. It wasn't quite as good as your traditional good blog post. Right.

Maybe your blog post standard is too high. It could be, it could be. Is it better than a high schooler? Is it better than a elementary school student? I don't know. You know, like we could always, yeah. And there's also the question of substance, like how much substance can it produce?

in X amount of words, you know? Yeah. Yeah, exactly. But I think, yeah, this has been very cool. As you said, there are many caveats like it's reliability that people have started noting because when you first see these demos, you might be really amazed and think like, you know, full scale AI, human of AI is almost here. But there are many caveats to be aware of.

That being said, this actually also makes me think of the thing we just discussed of how big compute is leading to diminishing results. Well, this is a counterpoint, right? GPT-3 conceptually isn't anything new. They just took an existing idea and scaled it up.

So they had a giant model with 175 billion parameters, way more than is typical in AI. And I trained it on basically all of the Internet to do autocomplete from any kind of bit of text. And it turned out to have this qualitatively different skill than prior models that it could be prompted to do different tasks

with very little, very few examples. So that's something that was, I would say, pretty unexpected, I think, by the community. And it shows that, yes, by some metrics, we get diminishing returns. But in other ways, as you scale up computes, maybe you will get qualitatively different results. And that, of course, definitely matters in research. And I think that's surprising on the one hand and also expected on the other where, you know,

uh, in even just, if you think human society, when we think of scaling up numbers of people, we think of a team and then an organization and then, uh, a city, a nation, and then the world. And so like at each of these scales, we can accomplish different sets of things, right. And then we think of different types of things. So it does in a sense, kind of make sense at scale at a certain level will turn into something that is qualitatively different, uh,

But we don't know what those thresholds are necessarily in AI. But here we seem to have crossed one. Exactly. And in that sense, OpenAI should be commended for making the investment. This was a very expensive model to train. It took a lot of work. And it was a bet of theirs that you would get something qualitatively interesting and not just more of the same. And in this case, it seems to have paid out well.

And with that, thank you so much for listening to this week's episode of Skynet Today's Let's Talk AI podcast. You can find the articles we discussed here today and subscribe to our weekly newsletter with similar ones at skynettoday.com. And this entire podcast was produced by TBD3. Just kidding.

If only, no, we actually had to do the work. Subscribe to us wherever you get your podcasts and don't forget to leave us a rating if you like the show. Be sure to tune in next week.

GPT-3, Limits of Deep Learning, Deepfakes in the Real World 33:49 Share

Last Week in AI

Deep Dive

Shownotes Transcript

GPT-3, Limits of Deep Learning, Deepfakes in the Real World