Hello and welcome to Sky News Today's Last Week in AI podcast. We can hear AI researchers chat about what's going on with AI. As usual, in this episode, we will provide summaries and discussion of some of last week's most interesting AI news. You can also check out our Last Week in AI newsletter over at lastweekin.ai for articles we did not cover in this episode.
I am one of your hosts, Andrey Krennikov. And I'm Dr. Sharon Zhou. And this week we will discuss some cool articles around AI voice acting tool that's getting better at recreating video game voices. We'll talk about how Microsoft has acquired an AI powered moderation tool. We'll chat a bit about MIT's research on deep neural networks and how that explains how the brain processes language.
as well as a bit more about Clearview AI coming back, a bit more on Tesla coming back at us, and finally landing on a fun note related to DALI in Russia. Yeah, so some good variety going on this week. Starting off as usual with our application stories. We got the first one, AI voice acting tool, XVA Synth gets even better at recreating video game voices.
So this is kind of a fun story. We've already talked about something similar in the past where there's been demonstrations where you can use AI to generate basically voice acting for video games. And in particular, people have done this for mods. So instead of having voice actors, you can make a mod and have characters that sound like in the original, but you don't need to actually have voice actors. You can just
have an AI do that for you. And it turns out that there's a tool created by a modder called XVA Synth that is freely out there and available to be used by modders. And now there's a version 2.0 and it has kind of a lot of features, a lot of voices in it, a lot of control over energy and things like that. So yeah, it's pretty cool.
I think this is really cool that it's open source. We were just chatting about how indie game developers don't have super high margins, so they would probably love to just grab these for mods or for being made into their games. And I think this is a really exciting direction for Synthetic Voices.
Yeah, it's certainly cool. I do think that it's a little bit in the ambiguous area in terms of you can base the voices that you generate on existing characters, like, I don't know, characters from Bioshock Infinite or Fallout 3, like particular characters or particular voices from these existing games.
which is definitely a gray area where it's not like they licensed those voices from those actors, right? But then again, this is for mods. So I think as long as you stay in a sort of not commercial aspect, I could see this being okay. But we have talked about it being sort of definitely a question mark as to how this will evolve in terms of actors licensing out their voice and stuff like that.
I think them opening this up as an open source tool first will invite that and hopefully they'll adapt that over time. I think they just want to show that the technology works and then hopefully they will be paying those actors or somehow striking something there. For sure. And it's interesting. Yeah, it's cool to see also that I think it's basically one software engineer who built this, Dan Ruta.
So cool to see what passionate people can do, I guess, in their spare time. And on to our next article, Microsoft acquires AI-powered moderation platform 2HAT.
And so announced today, Microsoft has acquired 2HAT, which is this AI content moderation platform. The amount was not disclosed, but they've been working together for the past few years on proactively moderating gaming and non-gaming experiences for Microsoft. And it sounds like it's been fairly successful. So Microsoft has taken that in-house.
According to research, there's a considerable amount of online harassment. Four in 10 Americans have personally experienced some form of it. And yeah, it's important to have platforms like these helping with moderation. I know there are a lot of famous platforms that are known to be poorly moderated or very heavily user moderated, such as Reddit. Thoughts on this? Yeah.
Yeah, yeah. We were just chatting. It's interesting that it was founded in 2012 by a person who was a security specialist at Disney Interactive working on the safety and security team for Club Penguin.
So this person certainly has experience with the sort of issues of cyberbullying and harassment on the web. And Club Penguin is especially used by children who are, I would say, especially kind of susceptible to this sort of issues.
So certainly I think automated AI powered moderation is something we want to have that would help tackle this huge thing where human moderation is just not going to scale. And it's also interesting to see that the Canadian law enforcement uses or works with a company to train AI to detect errors.
child exploitative material such as content that is inappropriate. So yeah, overall, it seems like a good acquisition. Microsoft, of course, has Xbox, who is a gaming platform, so I think it makes a lot of sense for them to work together and make sure that gamers don't get too crazy.
Club Penguin definitely didn't have any harassment. No, I'm joking. Definitely probably breeding ground for cyberbullying back in the day. So a very apt person to be handling it. And just to be clear, just as like an extra note, they are also catching things, you know, that aren't
you know, that do handle like bad grammar, awkward spelling, because that is rampant in the internet in terms of ways to get around some of these moderation strategies. And so having a dedicated force for that, I think is really important. And it almost makes me think of just cybersecurity in general, you know, like cybersecurity also feels like, you know, cyberbullying. It's often this, it's separate entity and separate company really, really focused on doing it well. Yeah.
Yeah, exactly. And I think having been around since 2012, I would hope that they're premature and not just like, you know, a startup, but just trying deep learning at it, which is not going to work that effectively or that robustly.
Moving on to some discussion of new research, we have MIT's latest AI research using deep neural networks explains how the brain processes language works. So this is about a paper called The Neural Architectural Language: Integrative Reverse Engineering Converges on a Model for Predictive Processing. A bit of a mouthful.
The basic idea is that we have these neural models, neural networks that process text, speech. So there's different kinds of tasks where you could train neural networks for question answering, autocomplete, things like that. And so what they did was gather, train neural networks to do this task and then have people do the same task and then see if after being trained for the task, when you want to talk to them, predict
the neural activity of the people. So we've had readings of fMRI and see if the Neuron could actually predict what happens in our brains of people as they do the same task. And they showed that at least for one of these tasks, you get some amount of accuracy. You can, to some extent, predict what is going to happen in the human's brain.
as well as to predict how well the person would do. So interesting combination of machine learning and neuroscience here. What do you think, Sharon? You know what I think.
You were a bit skeptical. I'm super skeptical of stuff like this, especially when I see, I don't know, just this correlation between, you know, how neural networks think versus how humans think. So to dig deeper into what's going on, I think the article very much inflates what's going on wording wise. And I think the paper itself,
might be inflating some of that as well. Or at least at least it presents some results that are quantitative that we can kind of look at and pick out. So basically, it's trying to, you know, to look at language models that do next word predictions. So the GPT variants are very much in that.
And based on those, see if they can transfer learn to predict some of these, like how human brain scans go and then look at the correlation or something like that. I do...
very skeptical when it comes to these types of work because I don't often see baselines towards, well, what if you transfer learn to something else, a different, you know, regress to some other task because that could be just as easy. There could be, you know, like it doesn't necessarily mean that the patterns are
done in the language learned by the language model are the same as the patterns learned like the human uses. So I do find that somewhat skeptical. Also, I believe, and please correct me if I'm wrong, but it does look like the humans were also supposed to do next word prediction. And that doesn't mean that
we naturally do it in the way that the experiment presented it, right? Like when I read things, I'm not sure I'm doing like next word prediction exactly in my head, but they presented it to humans that way. Yeah, yeah. I think it was a good reason to be skeptical about
And it doesn't mean that deep neural networks work the same as human brains or even similar. So it's not too clear how to really interpret this, aside from there's some correlation. So it's maybe in some ways similar. It's also interesting, just looking at the paper, which has a lot of details, obviously, that we can't cover. But one of the weird details is you also have a correlation if you just read
don't train the neural network. If you just have a large transformer model, it also can correlate on this task. So yeah, it's not too obvious. In the abstract, they do say that
This is consistent with a long-standing hypothesis that the brain's language system is optimized for predictive processing. So I would say I would give the benefit of the doubt to these researchers who are probably much more informed as to some of the details here. But yeah, it shouldn't be taken as anything too meaningful aside from sort of a headline that
This tells us a little bit that there might be some correlation and that's interesting and we should continue studying it. Right. And onto our next article, Making Machine Learning More Useful to High-Stakes Decision Makers. And this is an article that is based on the paper, Sybil, Understanding and Addressing the Usability Challenges of Machine Learning in High-Stakes Decision Making.
Okay, so this is very much a study around usability and kind of the downstream use of machine learning. And when we put it in the hands of users, what do they think of a model's predictions? And often what these researchers have found is that when...
when people are given an analytics tool and they see what's going on, they actually want to know the factors going in, why, the why behind the answers that they're getting. And they specifically examined the usability challenges in child welfare screening. And so this was done in collaboration with the Child Welfare Department in Colorado.
And they looked at how these call screeners were assessing their cases with the help of machine learning predictions. And basically, the call screeners really wanted to see why a machine learning algorithm predicted a certain risk for a child. And this is predicting that our child will be removed from their home within two years. So a very important kind of thing to be predicting there.
Yeah, exactly. So this is saying like instead of understanding how internally this works or like visualizing the network structure or weights or whatever, you really want sort of a high level explanation of why does this model think what it does? Why is it making this prediction? And should I trust it or shouldn't I? And that helps a lot with understanding how to interpret those predictions.
And yeah, I think this study is really cool. This has been going on for two years. The researchers looked at seven factors that make models less usable. So there was things like lack of trust and disagreements between humans and the model. And then some of them flew to Colorado to actually work with call screeners in a child welfare department.
And so they, you know, data user study where these call screeners actually interacted and they observed teams of screeners for like 10 minutes at a time to understand how this works. So super, super, I think, important research. There's a lot of applications where I think things like this would be useful, obviously, for medicine, for other things.
And I think this sort of human computer interaction research ACI is less common with AI, but definitely this is showing that it can be very useful. And on to our ethics and society stories. And first of all, we have one thing that we keep coming back to. We've probably talked about this like half a dozen times. Clearview AI.
So if you haven't listened to our many discussions of it, Clearview AI is a company that sells facial recognition. You can take a photo of a person and that will match that photo to someone's identity. It'll give you the name of who that person might be based on their face.
And they are selling this commercially. So they scraped like 10 billion photos from the Internet. You may be in their database. There is, in fact, a good chance that you are. If your image is out there publicly, they may have scraped it and included it in their database.
So it's been pretty controversial. There's a lot of lawsuits going on. But this story is about how they underwent testing by the National Institute of Standards and Technology to basically evaluate the accuracy of their algorithm. And in that sense, they worked...
somewhat well it uh their product at least is accurate in correctly matching two photos of the same person although that's not what they sell so you know something at least their algorithm works sort of well but uh maybe not what we really care about uh what do you think sharon um
I think it's first great that there is kind of
this study going on or rather test going on. It is interesting that the Clearview CEO called the results, quote, an unmistakable validation of his company's product. So it does feel like that this confirmed that they're, you know, at least accurately
among other similar companies and algorithms. But I don't think it necessarily means they're matched up to the best or that they took data that was theirs to take, especially if it was like 10 billion photos. So, yeah. Yeah.
Yeah, it's not necessarily going to make me feel better about them. And this test we took was not even what they sell. They sell, take a photo and match it to a name. He was just saying, can you predict if two photos are of the same person or not? And they haven't released this other test for some reason. So yeah, not telling you much and yeah.
It's good to know that it would work well, you know, since federal police agencies use it and things like that. We don't want it to not work well, but all the ethical issues, all the reasons this might not be needed in the first place are still there. And on to our next article in Ethics, Tesla pulled the latest FSD beta from owner's cars today.
And FSD stands for full self-driving. So Tesla actually did roll back their latest FSD beta because it was having some issues and this was very unexpected. And Elon Musk actually did confirm the downgrade and said that
actually that this only underscores the need for public betas. And it was because there were some issues with left turns at traffic lights and that they're working on the fix. So it's good that they, you know, roll things back if things weren't working well. But it definitely like things are kind of moving very quickly in this space and things are being rolled out very quickly.
Yeah, it was interesting. I saw when this happened, there were multiple Reddit posts reporting that their car did something weird after the update. So I guess it was kind of pretty widespread. And yeah, this is an interesting story. I think this idea of beta testing software for self-driving is...
Interesting, because obviously if you really are going out to your beta testing, then you could have actual accidents where people who aren't signed in for this beta program get hurt. But at the same time, Tesla is limiting who can have the beta by their safety score, according to their metrics. And beta testing does mean that eventually you might roll it out in a safer way.
So, yeah, I don't know. It's it's I think not obvious to me if this is the best way to do it or not. But I guess the good news is they could roll it back as soon as they saw issues and nothing bad actually happened. Right, right.
And onto our last article that is fun. It is titled Rue Dolly, Generating Images from Text Descriptions or the Largest Computational Project in Russia.
All right. So DALI was a model kind of announced by OpenAI. And since that announcement is an announcement of a model that where you can enter, you know, a multimodal model with text and images. So you could enter an avocado chair famously and it would produce images of avocado chairs that were completely new. And since the release of DALI, Chinese researchers have been working on this, but also in
Russia as announced recently. So they came out with what is known as Rue Dolly, you know, very, very, very different name, I see. But the XX large, like the largest one is 12 billion parameters. So very cool that this is coming out everywhere does sound like it is useful. And it sounds like Russia once again into the game.
Yeah, I think people on Reddit were pretty excited by it. You know, when all this VQGAN stuff, that was popular with Clip. It was interesting to see there was this sort of...
community activity of developing these collab notebooks so everyone could just hop on and play with it. And something similar is happening here. Now that the model is open source, people are sort of making it usable in the English language and making it super easy to just type in text and see images. And yeah, as we've seen with the UGAN clip, it's a lot of fun to do these sort of image from text things.
I will say their avocado chairs are a little underwhelming compared to the open-eyed Dali. But still, these look really nice. So if you haven't seen Dali, you might want to look it up because it is a bit surreal how good the images that you get from text are.
Right. And these images look really great, too. So, yeah, they don't look as good, but they do look very, very good. And so I guess made the race begin, you know, or continue rather. It's actually quite exciting to see the different countries in a sense compete. Yeah, I found out from this article that there's also a Chinese model. Yeah.
released that has four billion models parameters so yeah i think that's good you know we also seen as a gpt uh free open air also developed it hasn't released the model uh and then others have gone on to recreate at least a version of it and are working on it so it's it's cool to see that you can't
Navy can't do anything truly proprietary. If you know a basic idea, then others will release it open source. And with that, thank you so much for listening to this week's episode of SkyNet Today's Last Week in AI podcast. You can find the articles we discussed here today and subscribe to our weekly newsletter with similar ones at lastweekin.ai.
As usual, with any podcast, please subscribe if you aren't. And if you are, please give us a rating and a review on Apple or Spotify or whatever. It would help us a lot. Be sure to tune in next week.