We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Reflecting on AI news in 2021 (so far) with the host of the Towards Data Science Podcast

2021/7/21

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Jeremy

领导EAA飞行熟练度中心，推动飞行员培训和安全提升。

Sharon

国际仲裁专家，擅长复杂争端解决。

Topics

Jeremy: OpenAI 的 DALL-E 和 CLIP 模型展示了 AI 向多模态发展的趋势，通过结合语言模型和图像模型，实现了图像和文本的相互生成和理解。这种趋势源于对大规模模型和算力的依赖，而非更巧妙的算法。同时，这种发展也带来了 AI 安全和伦理方面的问题，需要在模型规模和安全之间取得平衡。 Andrei Krenkov 和 Dr. Sharon Zhou: 对 Jeremy 的观点表示赞同，并补充了 OpenAI 的发展策略以及对 AI 安全性的讨论。他们认为，OpenAI 的发展遵循了 Rich Sutton 的“苦涩教训”，即更强大的算力、更多的数据比更巧妙的算法更重要。同时，他们也讨论了独立研究者在 AI 安全研究中的作用以及模型的广泛传播带来的风险。 Andrei Krenkov: 就面部识别技术导致的错误逮捕案件，表达了对 AI 技术在实际应用中偏差和风险的担忧，特别是对少数族裔的不公平影响。 Dr. Sharon Zhou: 就面部识别技术导致的错误逮捕案件，表达了对 AI 技术在实际应用中偏差和风险的担忧，特别是对少数族裔的不公平影响。并进一步讨论了 AI 系统的公平性定义问题，指出机会平等和结果平等的差异，以及在实际应用中如何平衡AI技术与社会价值观。

Deep Dive

Chapters

The discussion explores the advancements in multimodal AI models, particularly OpenAI's Clip and DALL-E, highlighting their capabilities and the trend towards integrating different AI modalities.

Shownotes Transcript

Translations:

中文

Hello and welcome to SkyNet Today's Let's Talk AI podcast, where you can hear AI researchers talk about what's going on with AI. I am one of your hosts, Andrei Krenkov. And I'm your co-host, Dr. Sharon Zhou. And today we have a special crossover episode with the Towards Data Science podcast, which has lots of interviews with interesting people from AI and data science.

So let me welcome the host of that podcast, Jeremy. Hey, everybody, especially if you're a longtime listener of Let's Talk AI. Yeah, I'm Jeremy. I'm from the Taurus Data Science Podcast. If that's where you're listening to this, then hello again. Yeah, I'm really excited about this. I think this will be really cool. I should mention, I'm coming at you guys on like...

16 hours into dose two of Moderna and I'm feeling like the good kind of febrile. So we'll hope that this makes for a good episode. But anyway, that's the status of my end. I'm really looking forward to this. All right. So you got some chemical enhancers. It's going to make it fun. And yeah, we are also quite excited. This is our first crossover with another podcast. So something kind of new.

And yeah, so since we're doing a crossover, we had kind of a fun idea of discussing some of the highlights for us of news about AI so far in 2021. So usually we discuss on a weekly basis what's been going on last week. Now we're taking a look back and just going to chat about some of the stories we thought were interesting.

Yeah, so go ahead and take away, Jeremy, with your first picked news story. All right, let's kick this sucker off. So yeah, I guess...

First things first, you know, if you're ever in a position where you're trying to think of like how to do a roundup of top AI stories in like an entire half year and that half year happens to be 2021, then, you know, good luck. There are a lot of things to pick from. So, I mean, I guess I tried to pick a story here that says something about the general direction of the field, something fundamental about

the direction I think it's heading in, and maybe that'll be an interesting part of the discussion too. The title of this article is This Avocado Armchair Could Be the Future of AI. And I think if you've spent any time on the OpenAI blog or follow OpenAI on Twitter, you probably know exactly what this refers to. So this is their, actually their two kind of recent movements

mishmashes of like language models and images, like models to try to put those two things together. And I think this represents an increasing tendency in the field to go multimodal. So we're seeing more and more mixing of different kind of operating modes. The basic idea here is, so you have two different models. One of them is essentially a kind of, you can think of it as a classifier. It's not exactly a classifier, but it's kind of a classifier and that's Clip.

And so Clip is this model that classifies images not in the kind of standard ImageNet way where you have a class, a label associated with every image, and you just kind of like predict which bucket it falls into out of a finite set of buckets. Instead, what OpenAI did here is they used essentially an embedded version. So they collect a huge data set of images and captions. And then they created, using a language model basically, an embedding for each caption.

and kind of predicted the embedding. So essentially, you have a mapping from a fairly general and robust image to a fairly general and robust text description of that image. And so it's not about saying, "This is an image of a cat," or, "This is an image of a dog." It's about saying, like, you know, what might the caption be? Oh, it's a cat, you know, sitting on a pillow at dawn or something like that. So it's able to capture a lot more of the semantic kind of meaning behind these images.

And this inherits from a long line of similar experiments that started as far back as 2016 when essentially you could

you could do the same thing. You take an image, some people tried doing this, you take an image, you map it onto like a word vector and, um, or, uh, you know, dimensionality reduced representation using a less sophisticated language model. And you get still impressive results, but not quite as impressive as what clip can do. So that's sort of like that's clip, um, the classifier. And on the other side of that is Dali, um, which I'm pretty sure I'm saying, right. It's sort of like meant to be, uh, like that, uh, you know, the Wally movie.

The name is inspired anyway by that, so I assume the pronunciation is too. And yeah, so DALL-E is kind of the generative version of this in a sense. When the paper was first published, it was a little ambiguous what the actual algorithm was. And since then, I mean, there were a couple of hints that were dropped and things like that. So I won't get too deep into the details of how DALL-E actually functions. But the bottom line is that this is an algorithm where you feed it a verbal description of what you want it to draw.

And you can also feed it some kind of slices of an image, like some raster scanned chunks of an image, and it will fill out that image. Or you can just feed it a verbal description and then the image gets filled out. So this is like...

I think one of those proof points that OpenAI is working towards as it tries to scale up things and mix between different modalities, we're really seeing like here the power of large language models, very scaled things, and how you get this almost emergent, I want to say almost creative kind of behavior, which is what DALL-E does. If you look up some of the images it can produce, they truly are striking.

But maybe I'll park this there just to see if actually if either Andre or Sharon, you guys object in any way to my description or you want to toss in any thoughts about it. I think those are great descriptions. I'm super excited about the direction where things are going in terms of the multimodality aspects, because I think before it was hard to do that because

Because you needed multiple different modes like NLP and computer vision to actually be good enough. And now that each of them is good enough, we can actually get decent embeddings for each of them so that they can interact. And in the case of Clip, we are actually getting them to embed in the same space. And so that's super, super important.

I think. And I think as a human, when I think through, you know, different things, I do parse things differently and from different modalities, of course, vision versus speech versus versus text. But I also think of them as almost the same embedding and representation in my head as well.

Like a picture of a cat is also a cat visually and a cat as the word picture and the caption picture of a cat. So all of that is the same to me. So this, I think this trend makes a lot of sense.

Yeah, and I believe they actually had this demonstrated that later on they published something like a study of individual neurons. And they showed that a single neuron captured both the photo of Spider-Man, but also a sketch of Spider-Man. And also, yeah, something like that, which was another very cool discovery piece.

And yeah, I think you're right that this is part of an emerging trend. It is also a continuation of a trend of transformers changing everything and just pure scaling. Because as you said, you know, this idea isn't new. Really, there's relatively little about the basic approach that's novel. It's more about scaling up and using more powerful architectures like transformers to

to really get some striking results, which in some ways is also what GPT-3 did. Just by using this architecture scaling up, you got qualitatively very cool stuff.

Yeah, I actually think like, I really agree actually with both of your perspectives on this. I think it's really hard to understand the decisions that open AI has been making over the last, let's say five years or so, or three years at the very least without looking at it through the lens of like Rich Sutton's bitter lesson. Just the idea that we don't need more clever algorithms quite so much as we need more scale, more compute, more data. Basically, it's just a matter of horsepower.

and we're really seeing them bias in the direction of like these highly general not like general from a capability standpoint but like general from an architectural standpoint models like transformers that can be used just as well on images or text so they kind of had this commitment to let's just like come up with this very repeatable set of building blocks and then let's just put a crap ton of them together and hope that it scales well and so far it's been scaling well which is like

and like, let's say jarring if you're focused on tracking AI capabilities and staying ahead of them. I think it also is very general in terms of its ability to do different tasks. Since something that we've learned is a lot of these models, GPD3 in particular, you know, was trained to literally just predict the next word, or the next token rather. And it can do a bunch of different other tasks.

with basically zero-shot performance, very good zero-shot performance. And so it can suddenly do translation because it happened to learn that through this whole process. It can suddenly do summarization. It can suddenly do regular text generation. And it could do all of these things. And for a lot of them, it can beat the tuned models, the handcrafted perfect models to get there.

And so I think it is very general in architecture and very general in terms of possible tasks and kind of eating up the world of NLP and then now with multimodal stuff, maybe eating up the whole AI space. I mean, they're going towards AGI, so maybe that's a call.

Yeah, and I must say, I think I was part of the skeptics as far as OpenAI for a while. And I think part of it was for a while they were scaling up reinforcement learning. That was their main thing. So they went for, you know, Dota, stuff like that. They had robotics. And the problem with that was always, well, you can scale it up, but you're still limited to simulation, right? And so your problems are very limited. And here they scaled up in a different dimension, right?

With GPT-3, they had self-supervised learning, which is another trend. Here they had the similarly sized data set that they scraped from the internet. So I think they really may have found the right approach to scaling and actually found their paper on scaling laws also very interesting. So I think both from demonstrating what you can do and studying it, it's been really cool. Yeah, yeah, definitely agree.

Something that's interesting is that I think the infrastructure, like building out the giant infrastructure, is now kind of the almost competitive moat that they've built instead of all of the architecture. Because that's simple. The way it's trained, it's simple. Even writing code now, it's so small. If you look at the code needed to build a transformer model, it is tiny in JAX, for example. And so now it's all about what is this infrastructure to clean up?

and preprocess and scrape all this data together. And that is, you know, that is a big lift since that's a lot of data. Yeah, it almost seems like strategically the game plan here really is to like to look at the inputs, almost the economic inputs. And actually you hear the people at OpenAI talk like this when they discuss kind of the big picture vision of the company.

But, you know, they're talking about simplifying the economic inputs to AI. Like right now, it's something like, you know, obviously massive amounts of compute, though that's sort of a pretty new phenomenon, decently massive amounts of data. And then like all of this bespoke, like,

customized machine learning AI expertise where people are solving new problems as they arise. And I really get the sense that, yeah, like their goal is to take that constraint out of the equation so that effectively we live in a world where the race, the final legs of the race towards AGI become effectively just a matter of scale. And, you know, you can wonder whether that's a good thing.

because they're making these big leaps. It's not like OpenAI is making a 2x bigger model every time they choose to grow things. It's like 100x bigger. I mean, that's what happened with GPT-3. Yeah, so we might well be surprised. Yeah.

Well, that's also what I find interesting here. I don't know if you saw LUF4 AI. I don't know how to pronounce that. They just released their one-year retrospective. And that's, if you don't know, the group that kind of tried to recreate GPT-3 and is still trying to do that, but it's sort of an open-source collective effort that isn't a company per se, but it just sprung up when GPT-3 was announced and it...

you know, wasn't available, wasn't open. And they've already done something. They've collected a data set called the pile. They've released smaller models that are like 6 billion parameters, you know, smaller than GPT-3, but on the way. And I think they're still intending to get to GPT-3. And so, yeah, it's interesting.

They're trying to exceed GPT-3, actually, because I think GPT-3 has 175 billion parameters, and they want to get to a trillion parameters. So they're getting there. GPT-J, which just came out, is 6 billion parameters, and...

Yep, they are making their way towards that. And I think that is kind of one trend right now with Hugging Face, the company, and just there is, and actually there's an event right now for Jax and to build a lot of these open source models using TPUs that...

Google is essentially donating towards this open source effort to make a mini-Dolly, for example, or a mini, or actually not a mini, to retrain a lot of these large models, but towards other ends. For example, GPD2 or GPD3 for Romanian

for example. Yeah. And that's actually, I think a really, there's kind of a debate that's happening almost under the surface here or implicitly. I had a conversation with Connor Leahy, who I think is one of the founders of that initiative. And we were talking about the sort of the safety, because he's a very AI safety focused person. He's focused on in particular AI alignment. So sort of

averting the risks that come with an AI that might not have either the right loss function or that might be deliberately deceptive once it's sufficiently, let's say, sufficiently capable.

And we were talking about this in the context of, on the one hand, you want independent researchers to be able to do AI safety research. And a big part of the reason why you want this is that firms like Google and OpenAI, and now we've got Huawei who's jumping in. They've got their own version of GPT-3. All these orgs are racing, and when you're racing to scale something,

Like your safety comes from your margins, your profit margins. And when you're competing with people, margins erode. And so the space for safety becomes very limited. This is where the value of independent research comes in. People can be exclusively focused on it, but they can't make much progress unless they have access to these big scaled models. That's kind of like the pro argument. And then the flip side of that is like, OK, but you're also causing these things to proliferate.

and they become much harder to track. GPT-3 wasn't released by OpenAI for safety reasons, or at least that was their argument. You can imagine that happening over and over, more scaled models. So it's kind of an interesting balance that we've got to strike here. And I honestly don't know where I fall on that, but it's an interesting debate anyway.

I think there are interesting ways to maybe align incentives with the fact that, you know, if it's unsafe model, we can't really use it reliably for things, right? If, you know, GitHub Copilot, it tells me that it'll maybe produce like racist variable names, which I heard was a thing. Like, maybe this isn't the right product then. Maybe it's a model that was constrained from the beginning or had, you know, some kind of much more thoughtful pre-processing, post-processing, even like

training or training objective, they're modified for safe use. So, yeah, I wonder if there is some incentive alignment that could happen to make these models more safe, at least along certain axes. Yeah, I guess we might hope that as OpenAI commercializes, they would be incentivized to take some of these things into account. Of

Of course, there's an active area of research in AI on training language models without bias to have fairness metrics and so on. And so integrating that, you know, not just having the race for scaling, but having a race for responsible scaling would be interesting. But it is, yeah, maybe optimistic to say they'll slow down to make sure there's no issues. We'll see.

Yeah, it's an interesting though consideration too, right? Like I think Sharon, you're exactly right. Like there is this camp of people who do tend to think that, you know, AI capabilities and AI alignment are going to have to grow in parallel at a certain point because you just can't have like crazy powerful AI systems that are deployed in important realms that generate value without being safe. Because, you know, eventually you're going to be

running into like really shocking and weird failure modes that nobody anticipated. So I definitely agree. I think there's an interesting discussion to be had there about what the future of this stuff really looks like. Okay, so yeah, that was really fun talking about OpenAI and definitely do Google for DALI and these avocado armchairs because there's a lot of fun images along those lines and it really is uncanny to see what it can do.

But moving on to another topic, the one I chose, we have here this article called Facial Recognition Tools and Spotlight in New Jersey False Arrest Case. So the short version is in 2019, a person named Nijir Parks was accused of shoplifting candy and trying to hit a police officer with a car.

And the police apparently identified him using facial recognition software, even though he was 30 miles away at the time of the incident. And he spent 10 days in jail, paid $5,000 to defend himself. And then the case was dismissed for lack of evidence. And now this person is suing the police and the city for false arrest, false imprisonment and violation of civil rights.

And the reason I chose this one is, well, Sharon, you know, we talk about facial recognition all the time. It's just like the biggest topic as far as, you know, AI beyond research in society. And this one in particular is the first person known to be falsely arrested based on bad facial recognition. And in all three cases, the people identified by the technology have been black men. So, yeah.

It's, I think, a really relevant thing to be aware of that, you know, talking of racing in AI, facial recognition is being deployed as a product and being used as a product. And it seems like it's really not being used thoughtfully and not being done well. So more of a bummer topic, but I do think something very relevant in 2021. Yeah. And I think it does. It's funny how all these things do come back to kind of the same theme of

you know, improperly aligned AIs in a way. Like we've clearly specified in some sense, a loss function that does not reflect social values at the very least. You know, we can argue over what those values should be, but the bottom line is there's been through the lens of at least some people, a failure here. And it's like these, it kind of feels like these alignment failures are not being viewed through that lens. But like, to me, the kind of bias, fairness, ethics, and alignment universes are,

They kind of feel one in the same. I wonder what your take on this is actually, because you guys are the Stanford people here.

Oh dear, we do not speak for Stanford necessarily, but I definitely think it is a huge issue. And I think it's AI more broadly too. We just had a article where someone, when they Googled themselves, their name is actually the same as the name of a serial killer. But the Google card that came up showed...

showed his face, his picture chosen for the serial killer caption. Uh, and that, you know, that's not great for his job prospects, uh, having that. And of course, Google took that down immediately because he posted on Twitter, but just thinking through like, Oh, we now we have much more, we have much more autonomous or AI guided weaponry as well. So if there's incorrect, if something is incorrect, it could really harm certain people. Uh,

That's really alarming on the one hand. On the other hand, there's also the whole argument of AI is trained on us and we are very biased. And I think there's just the general feeling of where do we want AI to stand in terms of

its morality? Do we want it to be above us? And I think we do. We want it to be above the average of us, I think. And yeah, and it's like, is that a really high standard or do we want it to be like this perfect thing? And this is something that I feel like I've had philosophical debates about, you know, in relation also to just self-driving cars and everything. But at the end of the day, I think there should be some kind of

There should be fairness across different groups of people, which is not seen today. Yeah, and I also say, I think this is an interesting case because there's two dimensions here. One dimension is the AI algorithm itself.

And of course, there's been very notable research showing that facial recognition algorithms from major companies like Microsoft, Amazon, IBM, all performed worse significantly for black people compared to white people, which was a huge result in, I think, 2018. And showed that, yeah, products are being deployed that are biased, you know, and people can pay for and use. So that's on the one hand. On the other hand, going beyond the AI algorithm and the alignment issue,

you have a case of the combination of human and AI. So here, you know, it's not just about the AI algorithm, it's also about how the police officers use it. And in all of these cases, part of the problem was that the officers didn't do their job properly. So in one case, you know, they...

just had this false facial recognition match and then they had someone choose a photo from a set of six photos and did not do any more investigation. So it could have been very easily disproven, this match, if they looked into any sort of additional evidence.

So I think it's also an issue of how people use AI, how they understand it, and how much I trust it. Yeah, and I think also how fairness is defined as well, because this is one of those thorny issues when you look into AI ethics and AI bias. I'm continually struck by this every time I do an interview with somebody who specializes in those areas. It seems like we haven't gotten to the...

the quantitative realm, um, in terms of defining consistent ways to think about fairness in particular there, like we have different mental models of what fairness could look like and we can quantify them. So for example, you have, you know, people who see like, um, let's say equality of, in some sense, equality of opportunity. So equality, um, when you feed in a certain set of data points and you want, in a sense, it's almost like

You want the model to be completely blind to class and balance in a way, or at least that can be part of it, where you can kind of use class appearance in the training set as a prior that informs the probability of ranking.

ranking people on the other side. And then you've got equality of outcome where you say, okay, well, let's make sure that the same number of errors are made for white people as black people, as Indians and so on and so forth. And figuring out where to draw the line on those things, it's like, this is a problem.

Yeah.

We'll come to different conclusions about what's right and what's wrong. Obviously, like an incident like this is pretty cut and dry, but there's a continuum of incidents like this that are really difficult to parse. And it's really difficult to get consensus on how to think about these to begin with.

No pressure philosophers listening in. You have to make a decision now or organizations will make it for you. And that doesn't seem like the best, you know, because like now it's like if we don't make a decision collectively by maybe philosophers, maybe like a group of people, certain people, individuals, organizations will make that decision for us, you know, so it's.

Yeah. Yeah. It's interesting because this is a tough issue and it does speak to another friend, which is, uh, increased or generally increased concern about ethical issues. Uh, so we just discussed a paper this week that looked into how often papers actually discussed, let's say ethical issues, um,

or a potential negative impact even after the New York New Europe's impact statement and it quantified it and showed that very few papers actually do consider anything besides you know sort of performance yeah anything downstream

And so, yeah, I think it's interesting in that things like in Europe's impact statement are kind of trying to push everyone, each individual researcher to be more thoughtful. And then in addition to that, we have, you know, people who specialized in AI ethics research who are partially philosophers because they, you know, are informed in ethics.

And now, you know, there's things like NACL has actually a panel of ethics experts where a regular reviewer, if they think there's a problem or something to consider, they can flag it for additional review. And there's also the Stanford Advisory Board where people can sort of get advice if they're not sure. So it's something the entire community needs to, I guess, understand.

reckon with as far as how can we make it possible to not have negative consequences while still continuing to do research and not getting bogged down in uncertainty or something.

Yeah, absolutely. And I think just to like reinforce the turtles all the way down facet of this, like there's also the question of what are the selective effects on people who show up at like Stanford to do this philosophy? And you can do this like along any number of different axes. But to to spare myself the hate mail, I'll just say like there are many different axes and you could look at people who show up at Stanford and you could.

look at it from a racial bias standpoint, from a political bias standpoint. You could look at it from all kinds of different perspectives and you end up seeing quite quickly that clearly this is not just an objective group of people who are IID sampled from the general population. So

And it's not even clear that IID sampling would even solve this problem, but setting that aside, it's just like, this is just an absolute kind of epistemological and ethical mess. And it kind of feels like the main thing we have to do is just sort of stay humble and not commit ourselves to, let's say, to dogmatically to any particular loss function at this stage. I could be wrong, but hey, maybe that's the humility part too. Well, I think...

Also, what we need to recognize is that these things are happening. This news story of people getting arrested because of bad usage of AI and maybe flawed software, it's hard to say, is happening. So there's some cut and write cases in terms of at least not research, but in terms of

effects of AI in society right now. And there we can already do things. And we have seen it discussed with Sharon, cases of regulations being passed in cities and counties. So people on the policy side, politicians who I suppose are supposed to decide what's good and bad to some extent are starting to tackle these sorts of things. That's a really good point. Yeah.

And bringing it back to the article, I think there's another aspect that we haven't discussed quite yet, which is this over-reliance on AI systems that causes...

a lot of downstream issues. So here it's the police, but we also see this in a lot of Tesla drivers who decide to not sit in the driver's seat anymore and just let the AI take it away, all the way to doctors who rely heavily on, let's say, an AI's prediction. And that's not great if it's going to be wrong some of the time, right? And so I think

A lot of that is very worrisome, but I think that is becoming more and more known by AI researchers that, oh, even though we have a human in the loop, we can't just outsource that. We can't just expect the human to act the same as they would have before using that system. And I think that interaction is suddenly becoming much more known by a lot of researchers. And hopefully that'll change the way AI

interfaces are designed or how we're thinking about, you know, whether this is safe to deploy or not. Yeah, really good point. Yeah, I guess we need some human computer interaction researchers in the mix when you put this stuff out into communities. So interdisciplinary understanding, you know, we need to move there as well.

All right. And on to our last piece that we're going to discuss today. A new Nirvana song created 27 years after Kurt Cobain's death via AI software from Billboard.com. This was about the lost tapes of the 27 Club, which came out earlier this year, which is a project that basically

basically uses AI to analyze, uh, lots of songs for each, uh, popular musician who had struggled with mental health and died by the age of 27. So that's why it's called the Lost Tapes of the 27 Club. And that includes Cobain, Jimi Hendrix, uh, Jim Morrison, Amy Winehouse. Uh, and then it has the AI write and perform these quote unquote new songs in their style. And this became, uh,

in an effort to raise awareness about mental health. But of course, it also got a lot of buzz in terms of an AI being able to generate music. And this is along the lines of the general trend of generative audio this past year. So Magenta has been doing a lot of work. That's Google's group that looks into classical music,

There was recently a video game player who had modified a video game for The Witcher and trained an AI model to sound like this character, this video game character called The Witcher, and be able to create a new character or be able to let that character speak more lines. And that got a lot of praise.

pushback from voice actors in that industry. So this is a trend that we're seeing more and more of. And I think, uh,

generative speech, generative audio will also start to make its way into this space. And I think that it will very likely be rolled into that giant multimodal model that we chatted about before with Dolly and clip. Yeah. I remember when we talked about this on a podcast, we listened to the song at that point. I think it's, it's not no longer available maybe because of legal issues. Um,

And yeah, it sounded a fair deal like Nirvana, at least to me. It had that style of music. And so this also ties into another trend which has been ongoing for a while, which is the interaction of AI with artists. So as you say, another application was generating dialogue that sounds like a particular voice actor. And we listened to that and it also sounded pretty convincing.

So, yeah, now it's coming to songs, coming to writing with GPT-3, coming to voice acting, music to some extent. So it's an interesting trend to see, and it's interesting to think that artists will also need to sort of understand AI and adopt it to some extent and, yeah, kind of find their space as some of it AI can replicate or at least augment.

Yeah, it's a great spot for regulators to be in too, right? I mean, you're looking at the IP implications of this stuff and it's like, this actually reminds me of, so way back in the day when I was a tiny little startup baby, we did YC in winter 18 and we had a startup in our batch that was doing basically voice synthesis. Yeah.

And they spent like an ungodly amount of time trying to convince – I want to say it was Morgan Freeman. I'm pretty sure it was Morgan Freeman to like help them automate his voice so they could use it for everything. And this of course like I think – I believe –

I believe that it ultimately fell through. And like, part of this is if you're Morgan Freeman, like how do you price that out? How do you make the decision? Oh yeah. Like for this dollar figure, it is worth it for me to give away what is kind of like the rights to my own voice, uh,

Um, it's, it's like very counterintuitive and you're now like, like you both alluded to, we have artists now who are going to have to become fluent in this kind of language and in the economics of this stuff going forward. And yeah, I mean, that seems like yet another, yet another thing to throw at them. Um, but an interesting, an interesting kind of, um, area that's going to emerge really quickly and it'll have to be regulated at some point, I imagine. Yeah.

Yeah, absolutely. And I think there's also the aspect of not just I'm going to license my voice, assuming it's perfect, but it might not be perfect. It might be destroying my brand. And this is something that I've seen startups who have tried to

tackle this space a struggle with because they actually destroy someone's brand or hurt their brand because these voices can't be that expressive. So like the expression, so the Witcher video game, he's a very stoic character. Okay. So this is probably like the prime, perfect example of someone we can train an AI with right now. And so I think it's like also being, having control over your voice is also having control over your brand in some sense. And, uh,

And also generally what you say, what you're okay with saying. Right. And that gets into, of course, the whole deep fake, putting words in people's mouth land. Right.

Yeah. Yeah. It's, um, as you said, I think it's touches on a lot of things, including deep fakes, which has been an ongoing concern so far. It hasn't really emerged. We've discussed a couple of small news stories, but it hasn't really been a problem, but especially if you can replicate voices convincingly, um, you know, it, it could be pretty bad. And another thing we've seen in this case, um,

This wasn't just AI. They actually did audio production afterward to make it sound more convincing. And they, I think to some extent it's possible that for Witcher 3 that was also the case. So now, you know, if you want to make a convincing thing, you can combine computer graphics or audio engineering with AI and as a result, you know, get something that

you know, is better than both. So it's interesting to see, I think, how these will be integrated into existing tools of professionals. And I see Photoshop already has some things, but so far it hasn't been really integrated. But I guess in this decade, it probably will.

Yeah. And as that integration happens too, I guess like one of the consequences will be people with less and less, like fewer and fewer resources are going to be able to do this. And this kind of loops us back around to some of the conversation we're having around like proliferation of models. But, you know, what happens in practice for prosecuting this stuff or for regulating it in practice, if you have, you know, some rando who can spin up an instance on AWS and like

have their own generator for Nirvana music or Amy Winehouse music. You know, in practice, are you really going to be able to track all these things down? And what does this imply about almost the supply chain of A.I. and how closely it'll have to be monitored in the future? It's like we're going back to the days of Napster, you know, the whole yeah, the implications change and it got a lot harder. Yeah, that's a good point. Yeah.

I will say one more thing, which is...

I guess aside from all these big implications, there are some fun kind of small things that this enables. So in terms of the video game dialogue, the positive aspects there is if you're a small video game developer who can't necessarily afford voice actors, or if you're making a really massive game with hundreds of thousands of pages, then for some side characters, you can generate audio instead of voicing them.

So that could cut down costs, make it possible to go bigger. And we've also seen something that I remember very clearly. There was this demo

of a game, a very simple game where you could walk around in a city and go up to like random people. Remember? Yeah. The hot dog guy. The hot dog guy. Yeah. So there was a hot dog guy and then you talk to him and his dialogue gets generated by GP free and then synthesized. Oh, cool.

When you can get into these generative games that are limitless, you have just a character description and from that you get dialogue and voice. It could make new frontiers of art possible. That's really true. When you're trying to look forward and predict the future of AI and its social impact, one of the perspectives that always comes to mind is, I think Peter Thiel said this at one point, he said the blockchain is

is libertarian intrinsically or anarchist or whatever it was. And AI is kind of, I don't know if he said communist or authoritarian, something like that. But just this idea of like AI being the centralizing force, whereas blockchain is decentralizing force. I think what this story shows is in fact that the story is a little bit more complex than that on the AI side, where you actually do end up with

you know, through proliferation, individual actors who are far more empowered than they have ever been before. And new art becomes possible, new culture, new expressions of, of all kinds of different things. And that is exciting. That's a very exciting part of the story. And it's all kind of in one very complex salad that we're going to have to navigate in the coming years. Yeah.

Yeah, Sharon, we just talked. You had a generative model of your own voice. I do. If you had any other ones, do you have around, have any fun of it, making it say weird stuff?

No, it can only talk about things that are in distribution of what I usually say. What does your training data set? My training data set was my Coursera course, so it can talk about GANs and pretty much all the GANs. So that's why. I think you may have found the one way around this copyright issue. If you actually make the algorithm that replaces you, maybe you can retain the rights to your voice.

I was thinking that, or like Morgan Freeman would at least have revenue share and have control and licensing power. And he would pay to use it easily such that he could just type out what he wants to say, type out the expression markers he would usually have and then improve whatever he hears and just say, yep, yeah, put me in there, put me in there. Yeah, now he won't have to go to be in commercials. He can just, you know, outsource his voice and that's it.

Yep. And with visual effects, he can also not work out or whatever look good and we can make him look good. Now's the time to share your Ethereum address so that Morgan Freeman can make a contribution and get some legal consulting done on this. Of course.

And then the final cherry on top is with machine translation, we can have him speak any language. And then we'll have Babelfish, essentially. And everyone can enjoy Morgan Freeman. Morgan Freeman for all. Which everyone deserves, really. So, you know, AI has some positive aspects, for sure.

And with that, thank you so much for listening to this episode of Skynet Today's Let's Talk AI podcast. You can find articles on similar topics to today's and subscribe to our weekly newsletter with similar ones at SkynetToday.com. Thank you, Jeremy, for joining us.

Thanks so much. And shout out, of course, to the Torsia Science podcast listeners as well. I hope you check out the Let's Talk AI podcast. There are a lot of really cool topics like this, and I love this format. So if you're a fan of this podcast, you should check them out. And yeah, have a great, I was going to say, have a great afternoon. I don't know what time it is if you're listening, but have a great next few hours and we'll see you on the flip side. Yes. Yeah. So thank you.

It's out of distribution. So thank you listeners also for listening to the Towards Data Science podcast. And you let Stock AI listeners check out their podcast. We have a lot of, not so much news talk, but interviews with a variety of people, which is also very cool. And we don't really do.

And yeah, subscribe to all our podcasts, you know, rate us on iTunes, you know, you know, the things we ask you to do and be sure to tune in to future episodes of Let's Talk AI and with the words Data Science Podcast.

Reflecting on AI news in 2021 (so far) with the host of the Towards Data Science Podcast 43:01 Share

Last Week in AI

Deep Dive

Shownotes Transcript

Reflecting on AI news in 2021 (so far) with the host of the Towards Data Science Podcast