Hello and welcome to SkyNet Today's newly renamed Last Week in AI podcast, where you can hear AI researchers chat about what's going on with AI. That's right, this podcast is no longer Let's Talk AI and is now Last Week in AI, which is what most of our episodes are about as is. We just figured, why not change it?
Otherwise, this podcast will be exactly the same. And so in this episode, we will once again provide summaries and discussion of some of last week's most interesting AI news. I am Andrey Karenkov.
And I'm Dr. Sharon Zhou. And this week, we're going to discuss the Tesla bot that was unveiled by Elon Musk on AI Day. We're going to talk about using synthetic voices to give Val Kimmerer his voice back. We'll talk about a huge paper that came out of Stanford, as well as how AI has landed people in jail with not much evidence.
And finally, we'll end on some fun notes around AI and art, specifically music. And let's take it away. Let's take it away. So first up, we have our application articles starting off with a Tesla bot.
So if you want to look up an article, we are reading Elon Musk unveils Tesla bot, a humanoid robot that would be made from Tesla's self-driving AI. And so, yeah, this was revealed last week on AI Day where Elon Musk revealed
basically presented this project of the Tesla bot. And this is a humanoid robot that is a general purpose robot. It's not designed for anything in particular. And Musk said that a prototype might be out by 2022.
So I was actually at AI Day. And if you watch the recording, I did actually ask a question. Classic me about the simulator, whatever was a generative model.
And I really, I actually thought this was a joke, but it wasn't. And it's going to actually happen. And I think this is towards the direction of building out a safe AGI or just some kind of AGI that is more general purpose that Elon has been talking about for a while. And I think this is one step towards that where he's doing that through one of his companies, Tesla, probably the one that makes the most sense.
and grabbing the same technology used for self-driving, for Tesla cars, for this humanoid robot. Yeah, it's kind of interesting. The idea here is it would be designed to do boring, repetitious, and dangerous work. And one of the slides said it would be friendly.
A lot of roboticists and other news outlets responded really critically to this announcement. So there were articles like Elon Musk's Tesla bot is a shitpost.
Don't overthink it. Elon Musk's Tesla bot is a joke. And Elon Musk has no idea what he's doing with Tesla bot. And that's because these are very, very, very lofty goals. So, I mean, coming a bit from robotics, I can tell you that developing the hardware alone is incredibly complicated. And we are nowhere near any sort of AI that can actually enable robots like this to do anything useful.
So it does seem like one of Musk's sort of things that he says he might do and then doesn't follow through with, which he has a history of doing. But I mean, it's a it's a fun kind of mock up. And if I do make a prototype, I'll be excited to see it. Yeah, I think it does feel like a bit of distraction since a lot of the Tesla deadlines aren't met yet. But yeah,
you know, at the same time and also with the big lawsuit happening, maybe this is intentionally distracting. That said, I do think he has been obviously very capable of executing and following through with a lot of lofty things that everyone all thought was a joke initially. So I'd be kind of excited to see where this goes. That's true. Yeah. I mean, you've seen that with space and electric cars and,
Not so much self-driving yet, but they do have an all-star team and are doing a lot in that space. And actually, I found this interesting that this AI Day, you know, for most of it, it was a pretty detailed presentation on kind of ideas and
techniques being used by the team. And the idea was meant to convince people to join Tesla. It was kind of for recruitment. And so it was kind of interesting how technical it got and how much of it was for that. It was, I think, about most of it, like 80 minutes or something.
And then at the very end, there was a six minute section where he did this. So I do think it's also a bit of, you know, a grab for headlines. And Musk did say that he does do some stunts to get free marketing in the past. So I do think there's an element of that, but it does make a lot of sense for them to work on this. I think it's a bit of both. Yeah.
Yeah, definitely. It was a pretty technical session, which is fun. Um, like the software stuff was great to know about, um, and learn what they're actually concretely doing. Um, I would say that they're not, I mean, they are doing some cool things, um, but it actually doesn't, it didn't quite feel like super state of the art, um, in terms of actually where research is, but that makes a lot of sense since they need it deployed and actually production and working, um, especially on the edge, uh,
and I thought, um, well, what was kind of more state of the art was the hardware. Um, and their take on it was really interesting with dojo. Um, and I definitely recommend you go watch it. Uh, but one interesting take with, uh, how the hardware, um, they came up with their own chip. Um, it was, it was very opinionated design and it felt like they were going after something that is going to be, um,
be able to handle that edge computing much better and handle smaller batch sizes much better. So it maybe makes sense for this vision task specifically, and it is a divergence, very apparent divergence, at least to me, from what NVIDIA and Google are doing in terms of handling larger batch sizes and also open AI for a lot of NLP tasks.
Yeah, I think my impression is there's been less interest in catering to edge. I mean, it's been a long discussion, but I think it's just less of a profitable thing. There's been more focus on data centers and so on that Google and Facebook have.
So, yeah, it'll be interesting to see this hardware come out. And it's kind of neat to see that now Facebook and Google and now Tesla all have their own hardware for this. So it seems like that's kind of a more and more usual thing that big companies do. And on to our next article, AI gave Val Kilmer his voice back, but critics worry the technology could be misused.
All right. So the actor Val Kilmer lost his voice due to a surgery for throat cancer back in 2015.
But the company, Synantic, which is a UK-based software company that uses essentially deepfakes to clone your voice for different actors and studios, was able to reproduce his voice using just very little audio that didn't break any kind of licensing. So now he essentially has a voice again and it can be used again and
And I thought it was really impressive to still follow those licensing constraints since, of course, we've seen people not do that a lot for audio and speech. But yeah, what's your take on this, Andre? Yeah, this was pretty interesting.
cool news. So this is following up on a month ago, we had the Anthony Bourdain news story where they used it for a documentary. But in this case, what's interesting is actually Val Kilmer's team approached Sonantic last year in December with this project of restoring his voice. And according to the press release,
basically, Val Kilmer just owns a model. He can use it for personal use or professional use, and that's it. So it's a great example of how this technology can do good. I mean, of course, there's presumably many other people who have this issue of losing their voice for various reasons. And I could have, you know, it seems like
this is a big deal, you know, uh, Val Kilmer says so. And I, it makes a lot of sense that this is something that is a lot of value and that will make a big difference in his life. Uh, that's positive. So yeah, very cool. For sure.
And onto our articles and research. The first is 100 plus Stanford researchers published 200 plus page paper on the AI paradigm shift introduced by large scale models. And the paper is titled on the opportunities and risks of foundation models. And so this is a hundred different researchers at Stanford,
Andre and I both know, many of whom we know, so including Fei-Fei, Percy Leung, etc. And they specifically talk about foundation models or define the class of these large-scale models as foundation models that we then build off of. So this includes a lot of these large NLP models that we've seen recently.
And, yeah, it's very much a kind of survey paper discussing and defining what this is. There's not actual, you know, real novelty in it. But they do talk about interesting kind of behavior from these foundation models like emergence. So like the emergence of innovation.
you know, behavior that we weren't expecting or that we didn't explicitly put in or construct. And, you know, I find that really interesting because I wonder if it's related to intelligence in some way. Like, do we as humans possess also emergent behavior? Or maybe I'm just interpreting that as, you know, second or third order behavior that wasn't explicitly constructed necessarily in our genetic code or something like that.
Yeah, so this paper, I think, made a big splash. And yeah, as you said, I think the biggest point and the most interesting point is sort of making the case that we need this term of foundation models as a distinct kind of category. And they have this argument of emergence and homogeneity.
So I found it a bit, well, I knew this was going on a little bit. So, you know, it wasn't a huge news, but I found the discussion a little interesting around it. In particular, I think there's been
you know, some critical takes in terms of why are we defining this category? And I have wondered if we really need a separate kind of category of foundation models. So the name the naming here is that these are foundations for other things that the models can be used to do many different tasks down the line.
And to me, I mean, that's not new, right? We found that early on with ImageNet fine tuning, ResNet 50 is super popular, other NLP models. And also in those cases, there were many different tasks that you could use with those features, right? So I do think...
I can sort of see some of the arguments in terms of basically this is a huge, huge model. So that means that not many organizations can do it. And so there's kind of limited access. But in that sense, I think maybe this wasn't the best category kind of definition to me. I'm not sure about that personally. What do you think?
I would probably echo those thoughts. It didn't seem like it was very necessary to name it, especially with the specific name, I guess. It didn't feel like the best name. I don't know what would be better, but I don't think foundation is how I would necessarily think about it.
But it is, you know, we've had these models for a while, like you said. It's interesting to kind of study the behavior around that. And I feel like in that sense, it's not necessarily a technical thing that's new. It's more of just...
It's almost a social science of how we perceive these models as humans. So maybe some HCI work there. Yeah, yeah. So the paper goes into it as a survey of what these can be used for and applications.
and also some of the implications for society. And so I think that's part of the initiative. And in fact, there's a new center at Stanford, the Stanford University Center for Research on Foundation Models. So I think the paper was sort of an introduction on that. And then later on, they'll actually present new research. And I do think that these models are certainly important
kind of a big deal. We've seen that with GPD3, and it's not everyone who can do this research. So I can see we need, but I think, you know, honestly, these are huge, expensive models, I would say, not necessarily foundation models, as far as I can tell.
And onto our next research story, we have DeepMind Open Sources Perceiver I/O, a general purpose deep learning model architecture that handles a wide range of data and tasks. So this is about the paper Perceiver I/O, a general architecture for structured inputs and outputs.
And it goes into how DeepMind had this thing called Perceiver, which is cool because you could basically use the same neural net model for a whole bunch of different input modalities. So you could use it for music, for images.
for geometric shapes without really changing the model very much, the structure of it, which isn't how it's usually the case. Usually there is different neural net architectures for different types of inputs and outputs. Whereas they introduced this general thing
And then this PerceiverIO is kind of a follow-up where instead of limiting the output of the model to basically labels and not something big like an image or sound, they generalize it now so that outputs can also be kind of anything. And yeah, now the code is open and yeah, I think it's pretty exciting. What do you think, Sharon? Yeah.
Yay, multimodality models, whatever, multimodal models. It's so great. I think, I mean, we kind of saw this coming with just, you know, models going in this direction as we, you know, conquer technology.
embedding essentially and representing images and texts and speech and all these different modalities, it completely makes sense to put them all together as, you know, clip and dolly kind of slightly made that progression.
I think it's also a progression of accepting larger inputs into models. And that is also related to problems OpenAI has been working on. So both of these kind of different streams of tasks that we are all working on or a lot of the community is working on.
And what's also interesting is that, you know, it's removing a lot of this pre-processing needed on a lot of the data. So we no longer have to embed these explicit inductive biases like tokenizing text. And that's really exciting. I think this is, you know, the path towards these very general purpose models. And of course, I'm sure DeepMind views this as the path towards AGI. Yeah.
Yeah, yeah. I think it builds on the trend, which has been very exciting and a little bit surprising, I think, in the past few years where transformers have overtaken the field. And initially they were used just for natural language processing. But then people found that you could use transformers for images, right?
And now you can use transformers for graphs. So transformers are kind of being shown to be very general purpose. And now, you know, this Perceiver model is even more of a case where it's it is a transformer and it very little adaptation to be used for all kinds of inputs.
And why that's useful is as people kind of study transformers and build, you know, code bases. Now you can really only work on this one type of model and get a lot of benefits on many different things instead of, you know, working on individual different models and then, you know, things not transfer it. So, yeah, I think this is a cool advancement.
And kind of a big deal. And onto our articles around societal impact and ethics. Our first article is Reddit user Reverse Engineers, what they believe is Apple's solution to flag child sex abuse.
Okay, so earlier this month, just not long ago, Apple announced that it's introducing a new child safety feature for its entire ecosystem. And as a result, it may need to scan or it is scanning the contents of iCloud and messages using edge computing on device machine learning. Okay, so it's doing that mainly to detect child sexual abuse material, CSAM, which is
you know, is a little bit controversial because people think this might be a big breach of privacy. And so following up on this very quickly, a Reddit user actually was playing with this, you know, the...
the hidden APIs related to this and has allegedly found or reverse engineered this neural hash algorithm used for that child sexual abuse material. And basically what he's done is, or what they've done, maybe not his here, what they've done is they found hashes that map on or pick images that map onto the same hash
And that creates a collision. And that's not great because if you have an image that is similar to a problematic image, they may still look at your Apple may still look at your your images. And of course, that can breach breach privacy. And I was really impressed with how fast someone was at breaking into this.
Yeah, this was a really fast developing story. And it is kind of interesting that Apple basically introduced this idea of scanning everyone's photos, you know, and trying to detect particular types of photos, which goes against a little bit of their concern for privacy.
So that, I think, is what drew a lot of the criticism and also people being concerned that the AI model will incorrectly flag images that are fine and lead to problems. And so that's what people demonstrated with these hashes. That just means that images that aren't the same ended up being...
being matched by the algorithm. And that means that potentially, you know, you'll be on the watch list for Apple. But Apple responded that, first of all, this reverse engineered thing isn't what they will ultimately use. And they also said that there is kind of multiple steps here where this is one algorithm, there's a second algorithm and then this human verification algorithm.
And, yeah, so now I think the collision thing added fuel to a fire. But I don't think, you know, Apple, I think, will just go through with it anyway. And definitely an interesting kind of series of events. And I don't think there's been much of a precedent for this sort of thing.
What do you think about the need for this neural hash system and this idea that Apple has? Do you think it's a good thing to do or not? Oh, I think the government pressured them or something because they wanted actual access to data and the privacy thing just can't hold with big tech. Yeah. At the same time, I do feel like
At least this application is a good one to go for. And I guess the reason people are angry, like Facebook already is doing this and then various companies, Google are doing it. So I guess the thing is now they'll be scanning things that are on your device on iCloud and messages.
But yeah, maybe I don't see the anger too much because anything you upload to the cloud is already being processed by these sorts of algorithms anyway. And the concern that you'll be incorrectly flagged, I think, is not very real. I think that's not likely to happen. So I think a little bit of a controversy is not well-founded. But the concern about privacy, of course, is, yeah.
I think I've heard it as an excuse for a lot of different companies, um, to break privacy previously, including Clearview. Um, so I think that's why I think people are most sketched out. Um, and I think it's related to actually our next article, which is how AI powered tech landed man in jail with scan evidence. And it's possible that, uh, this person, Michael Williams, who was jailed last August, uh,
Did not actually kill a young man, but he was accused of killing a young man from the neighborhood who asked him for a ride. And so there wasn't, you know,
sufficient evidence. Mainly the evidence came from a clip of noiseless security video showing a car driving through an intersection and a loud bang picked up by a network of surveillance microphones. And so this is another case of yet another case of someone who had to go behind bars for nearly a year in this case before a judge dismissed the case last month.
Yeah, so the prosecutor in this case said that an AI algorithm by company ShotSpotter detected by the sensors some signal that indicated that William shot and killed a man. So this was about how this AI-powered tech, as the title says, was the result of
was the reason that he went to jail. And he was in jail for a year before a judge dismissed the case, who said because there was insufficient evidence.
So, yeah, definitely, you know, kind of a tragic case of a man being put in jail with insufficient evidence and possibly by a flawed algorithm once again. And it's I was not aware of this as a thing, but apparently ShotSpotter is kind of all over the place. They're in one hundred and ten American cities.
And the idea is, you know, they place different microphones and cameras and sensors and they classify 14 million sounds to detect gunshots or non-gunshots. And law enforcement says it helps to get officers to crime scenes quicker and better deploy their resources faster.
But now there's been this case and an Associated Press investigation has also found flaws in using ShotSpotter as support. For instance, Valgruba makes mistakes. So yeah, I think this is something I was unaware of and does seem like a pretty bad trend and another example of how AI can have negative consequences and
is being used to expand surveillance and expand use by policing, which is leading to these bad cases. One thing that I find really problematic is actually the way we're using AI plus human, like the human judge. Essentially the latency of the judge dismissing it, of that person sitting in jail for a year, is really bad.
And I just feel like humans, like we shouldn't be collaborating with AI. Like this AI is like, Oh, I think you did something wrong. So you sit in jail for a year until it gets dismissed. I like, if there was an automatic way to dismiss it too, it would, or like a much faster way. I feel like it would, this would come under less fire and no pun intended, but it's, yeah, it's just really unfortunate. This kind of collaboration doesn't make for good results.
Yeah, and we've also discussed and seen cases where police officers have used facial recognition results without follow-up research, follow-up investigation, and that led to several false arrests.
So this combination of human and AI, if a human doesn't understand the algorithm and possibly misuses it, that kind of amplifies problems that are already there. So definitely concerning and definitely something that on top of facial recognition now seems like we need to be aware of and fight against.
But to change the tone a little bit, you know, let's move on to our fun articles that are lighthearted. And we have first year Philip Glass on artificial intelligence and art. So famous composer Philip Glass, who is renowned for his kind of minimalist work, had a conversation about, you
an OpenAI model trained on his music to produce Philip Glass-esque music. And there was a whole conversation, kind of a half hour discussion where he heard some of the output of the model and gave his thoughts. And it's pretty interesting and fun to hear his response. He critiques the output of the model, which
If you listen to it, it is pretty high fidelity. It has no kind of artifacts and is clearly a piece of music. But he critiques some of the drawbacks of the model. And, you know, maybe as a non-composer, this might seem like his type of music, but he really goes into how it's different. And yeah, it's an interesting conversation. And if you are a fan of his work, I'm sure this would be interesting for you.
I think he does say that, you know, this is promising. Though one quote, I guess, from him from the interview was that I found interesting was like, that's what we like about art, the human part of it, and it's not here. However, there's a lot of ideas here. This could be made into an interesting piece, but the machine won't do it. At least that machine.
And I thought this was a really interesting quote because he's basically saying, you know, we need more of the human aspect in it. He can tell this is like made by machine and there's a lot of ideas. Like he said, there's a lot of stuff and maybe kind of pruning or extracting some of that stuff into something more meaningful could happen with maybe some kind of collaboration between the machine and the human. But as of right now, he doesn't feel like it's exactly ready or mimicking him
Yeah. And early on, he also said that,
or ask what was this made for, which is interesting because, you know, it wasn't made for anything. The Newell Net just kind of output this random piece of music as they do. And that also seemed interesting to me that there was no sort of inspiration or kind of direction, obviously, on the Newell Net's part. And so, yeah, this does point to a direction where research models could provide ideas
and directions to these sorts of artists. And then they can then utilize that to create art. And I do think there needs to be some sort of, you know, human message, human inspiration to really make art. But AI can be a tool and, you know, kind of a muse in some sense. So yeah, cool to see this discussion, which I don't think there's many similar kind of articles that I've seen.
Right. And I think that brings us to our last article, the booba kiki effect and sound symbolism in clip. All right. Well, so we were talking about how, you know, there's not much purpose kind of injected into that open AI music model. But here, you know, with clip and BQGAN that we've seen, you can put in a natural language word and get something out. And recently people have been saying,
doing this with generating a booba or a kiki. And the booba kiki effect is this phenomenon where humans show a preference for certain mappings between shapes and their corresponding sounds. So the booba looks like this explosion, you know, spiky explosion thing. And...
Or sorry, one of them is a bubo, one's a kiki. One of the figures looks like this spiky explosion thing, and the other one looks more like a blob with curvier edges or curvier spikes. And yeah, it's interesting to see these models also going into this area of kind of like nonsensical things, but that do...
in some way also makes sense to a lot of us humans or a lot of us humans who, in this case, speak English. Yeah, yeah. So booba is associated with kind of blobby shapes. It produces these circles usually. And kiki is associated with kind of sharp stuff, right?
and lines and stuff like that. And why this is interesting is, you know, this is kind of just an immersion property. This wasn't programmed in. This model clip was trained on words, you know, not these kind of made up things like booba and kiki, but because it was trained on kind of sub-level things, you know, phonemes,
this kind of emerged naturally. And it is pretty interesting that, you know, this idea of sound symbolism and its effect got replicated kind of by accident. And the model learning process just sort of led to it. So, yeah, I think definitely a fun discovery and also an interesting discovery. So you can check out this article for the images and see for yourself.
I think it makes complete sense based on how we tokenize inputs, especially because some, you know, like the tokens really do correspond to, I guess, things that could look different. And that's it for us this episode. If you enjoyed our discussion of these stories, be sure to share and review the podcast. We'd appreciate it a ton. And now be sure to stick around for a few more minutes to get a quick summary of some other cool news stories from our very own newscaster, Daniel Bashir.
Thanks, Sandra and Sharon. Now I'll go through a few other interesting stories we haven't touched on. Our research story concerns AI and healthcare. It's long been suggested that deep learning models will be able to automate parts of radiologists' jobs.
The promises haven't quite been met over the years, but according to a report from the Radiological Society of North America, it does seem that these models might enable AI-assisted diagnoses in the future.
Although there are still many improvements to be made, AI algorithms, when sufficiently validated, could aid in areas such as predicting breast cancer from a mammogram, measuring breast density, and screening for lung cancer. But these are all statements of potential. Dr. Constance Lehman, professor of radiology at Harvard Medical School and director of breast imaging at Massachusetts General Hospital, has been working on this problem.
She and her colleagues from Massachusetts General have worked with computer scientists at MIT to develop and study the use of AI to predict breast cancer risk. She says they have focused on how to identify specific problems, then develop and validate AI models in other populations. But the final step is the most important: rigorous evaluation of a model that is clinically implemented.
Our first business story takes us to China. China's largest search engine, Baidu, believes it will have to do more than advertising to stay ahead of the game. As CNBC reports, Baidu just unveiled its second-generation AI chip, its first Robocar, and a rebranded driverless taxi app.
The chip, named Kunlun 2, is supposed to help devices process huge amounts of data and boost computing power. It is built to be used in areas like autonomous driving and has entered mass production. While there is no word yet on whether it will be mass produced, the Robocar highlights Baidu's ambitions in autonomous driving.
Demonstrating its own work in the area, Tesla held its AI Day event this past weekend. After dense presentations about Tesla's work in computer vision, navigation, and AI hardware, Musk brought out a dancer in a spandex suit — a not-yet-real version of the Tesla bot.
The humanoid Tesla bot, according to Musk, will stand 5 feet 8 inches, weigh 125 pounds, have human-level hands, and eliminate dangerous, repetitive, and boring tasks. The way Musk presented it, the Tesla bot was the natural next step with Tesla's autonomous driving work.
Fortunately, if you're worried about the robot revolution, the Tesla bot can only run 5 miles per hour. The unusual presentation, although maybe not totally outrageous for Musk, did receive a lot of coverage. But a fully functional humanoid bot is probably still far away. According to The Verge, Musk's claims do sound outlandish when put in context.
Even Boston Dynamics, which makes the most advanced bipedal robot in the world, has never described its machines as anything but R&D.
In our final story on AI and society, a committee at the University of Texas in Austin advised against using AI software to oversee students' online tests, due to the psychological toll on students and the financial toll on institutions.
According to The Register, In UT Austin's case, the student anxiety caused by the system's invasiveness was not worth the small benefit they got from using it.
The report suggests alternative methods of proctoring students, such as using Zoom for small groups. Thanks so much for listening to this week's episode of Skynet Today's Let's Talk AI podcast. You can find the articles we discussed today and subscribe to our weekly newsletter with even more content at skynetoday.com. Don't forget to subscribe to us wherever you get your podcasts and leave us a review if you like the show. Be sure to tune in when we return next week.