Hello and welcome to Scanit Today's Last Week in AI podcast, where you can hear AI researchers chat about what's going on with AI. As usual in this episode, we will provide summaries and discussion of some of last week's most interesting AI news.
I'm Dr. Sharon Zhou. And I am Andrey Komarenkov. And this week, we'll discuss Google Sheets' new AI autofill, NLP benchmarking, some issues with hospitals' AI tools, and some fun applications of AI to plays and helping astronauts.
So let's dive straight in with our first application news. This is titled Google Sheets Formula Suggestions Are Like Arufil for Maths.
So this just came out, I think, recently, where Google has announced kind of quietly that Google Sheets is getting the ability to intelligently suggest formulas and functions for your spreadsheets based on your data. So, for instance, you know, if you have a column, if you just press enter, it can suggest to do some or average or whatever.
And it's intelligent in the sense of it will take context into account. So if you have like the column is named total, then it will say some and so on. So, yeah, pretty interesting.
Honestly, it's about time. I thought that we should have a lot of these even rules-based detection much earlier on since I'm pretty sure a lot of spreadsheets are pretty boilerplate anyways. But this is awesome. This is also probably a bit of a reaction to OpenAI and Microsoft showing off GPT-3 doing the same thing in a Word document and so very close to Excel there.
And yeah, this I hope that this will be useful. I imagine it will be for all the things that I've had to type formulas in for. I know. Yeah, this is this is definitely something that makes a lot of sense. And given sort of trends, as you said, it kind of is about time.
There have been some similar things in terms of like series out of complete and so on for any specialty users. I'm sure they have analogs, but I guess the interesting thing here is the intelligence bit.
And right now, I think it's not quite as impressive as something like Codex. Hopefully, they keep working on it and, you know, improve everyone's lives who works with spreadsheets, which I would imagine is, you know, maybe even more useful when improving code production. Yeah.
And what's great is that you can easily, I mean, train this model. I mean, they can because they have all the data and it executes, right? So you can see it actually doing the right thing before and after applying the formula if you were to, let's say, mask out some of those values. So it seems like a straightforward thing to implement without anything kind of dicey coming out from the model. Yeah. Yeah.
It seems like there wouldn't be any license issues and like they could make it pretty careful because it's pretty narrow domain to not output like secret, you know, passwords or whatever issues with codecs we've seen.
And on to our next article titled More than 50 robots are working at Singapore's high tech hospital. And this is actually the Changi General Hospital, CGH, in Singapore. And more than 50 members of their staff are actually robots. So they do anything from performing surgery to doing administrative work.
But they've become this integral part of the hospital's workforce, which is really exciting. And the best known ones are the da Vinci surgery robots, which you might have heard of that help surgeons with doing, you know, minimally invasive surgeries.
But outside of that, the robots are doing delivery of food or linens or cleaning, helping with maintenance, helping with patient rehabilitation, helping patients get back into bed.
And also being a social partner with patients. And I find that really, really interesting. There are some, I think, serious patient outcomes coming, being seen in the reduction of sedatives because a lot of older patients get a robot companion and that makes them a lot happier, which is exciting to see.
I totally agree. I think this is one of the most exciting areas in robotics that I don't think many people are aware of, but it is kind of has been emerging for a while. And it's very kind of it makes a lot of sense, right? There's no worry here of replacing jobs because there's not enough people to get around to it anyway. And these robots are really fulfilling kind of the things that people takes up people's times and
robots could do. So like delivery, you know, these are things that nurses currently have to do, but they could spend their time on better things, as I think we've discussed in the past.
And the social robots, which are like, you know, the Perro little animal, I think seal and so on, again, is taking the role of elderly care professionals who there's also a shortage of aging populations in Singapore and elsewhere.
And they have been shown, as you said, to really have impressive outcomes. So, yeah, I think this is a great trend and I'm glad this article is highlighting that. And I do think it's impressive that, you know, in this 1,000 bed hospital, there are this many robots because I did not think there was this much adoption. So it's cool to see that some hospitals are really pushing for it.
Yeah. And I think this is, you know, much of it is very cultural. Like you mentioned, it's not controversial in any way because they don't have enough staff. So they need robots. So no one's job is really being replaced in Singapore due to the elderly population, which is known as, quote unquote, the three tsunamis situation.
which basically means the aging population is a huge mess, one tsunami. The shrinking workforce is causing another tsunami. And the increase in chronic disease overall in health care is the third tsunami in medicine. So this is just very important for them to have something like this.
And I think there has been increased demand in other countries, such as Denmark, where, you know, that kind of disparity is also seen between, you know, the workforce that is shrinking, but the aging population and need for health care that is that is increasing. Yeah, I'd be also interested to take a look at how
This is happening in Japan, where I think many people are aware there is an aging population. And of course, we're quite fond of robots. I wonder how common this is there. And on to our research section, where we highlight some news with respect to AI research. First up, we have the blog post, Challenges and Opportunities in NLP Benchmarking by Sebastian Bruder.
So this is, you know, as the title says, all about kind of the current state of NLP benchmarking and where we can go in the future, given where we are now. And it's quite a detailed look at kind of the state of things. So it goes over what is a benchmark, it could be history, and then a bunch of things for kind of where we can improve in terms of metrics, downstream use cases, fine-grained evaluation. It has...
Many recommendations. It is a very good kind of overview survey and definitely an interesting read for anyone who wants to get a picture. And I think even for non-researchers, this is a good way to sort of get caught up to the state of things in terms of kind of one of the big challenges and trends in AI.
Yeah. And overall, I will say, you know, the blog post does mention a lot of the benchmarks that we have now are, you know, outdated. They're just being we're getting superhuman, quote unquote, performance immediately. But I'd just like to note that, you know,
We're going to see that on old benchmarks all the time. Benchmarks will necessarily need to continually improve. I mean, I did my thesis on this, so just reflecting a bit. It's just the nature of benchmarks for that to happen. Because if they're too hard, let's say the human performance is actually too, too hard, then we can't optimize for them. We essentially have no gradient, right? We have no way of knowing how to make that better.
So actually having benchmarks where they're just in that sweet spot of we see improvement, but it's not quite surpassing the limits of that benchmark is that sweet spot. And I think right now we're kind of struggling with finding the next set of benchmarks that will take us through the next areas of research. But it'll be really interesting to to see to see where things go there.
Yeah, that's a fair point. And I think it's also worth noting that this blog post is specific to NLP, where I think this is a bit more notable. I think in computer vision, there's now a lot of these more niche tasks, I think, like few-shot learning and compositional learning and a lot of these different things. And those are not being kind of, you know,
solved, so to speak, as fast, whereas in natural language processing, there's been an effort to, you know, have benchmarks that sort of evaluate general sort of language skill.
And that's been really the challenge with things like squad and glue being really solved. And the point being made here is that despite these benchmarks being beaten and beaten in a sense of like the performance exceeds human performance very quickly, you know, that doesn't mean that the AI models actually can do these tasks. So another point here is that so far we design
doesn't seem to work in terms of you know being as you said a good enough challenge to really test the skills as intended and so i think it yeah it's quite interesting kind of the survey on how to design a benchmark and uh you know yeah what people are thinking as far as next steps
And on to our next article, Google Brain Uncovers Representation Structure Differences Between CNNs and Vision Transformers. And this is about the paper, Do Vision Transformers See Like Convolutional Neural Networks?
All right, so vision transformers are essentially transformers that are being used for vision tasks, so computer vision. And prior to using these, prior to transformers really taking over the world and attention being all you need, convolutional neural networks, CNNs, were mainly used for computer vision.
And this paper kind of goes, drops down and lets us see, you know, how are these two different model architectures, how are they different and how can we visualize that? Can we see how they compute representations differently and what is going on and does that make sense? And I think one striking thing is that the visual transformers compute representations
differently than, you know, ResNet, which is, you know, very basic CNN. And visual transformers actually more strongly propagate their representations between lower and higher layers versus ResNets.
And I think what's interesting there is it's not super surprising to me because a visual transformer is looking at attention at every single layer and it's kind of honing in on what it's trying to look at versus with convolutional neural networks. You're very much layering and aggregating disparate information for salient features as you go along.
Yeah, what are your thoughts on this, Andrey? Yeah, I found this pretty interesting. Some of the results are not very surprising, as you said. So for instance, in the early layers, closer to input, there's more global information than ResNet, which makes sense due to the design transformers.
some of them are pretty interesting for sure. Like the fact that the representations in the layers of transformers between low and higher layers are similar because kind of a traditional understanding of deep neural nets has been that you create this sort of
you know, successively more abstract representation. So at early layers, you have circles and lines and whatever. And then in the middle layers, you have more complex shapes. And then sort of at the later layers, you have concepts like dog and cat and whatever. So that's been kind of a common wisdom. And it looks like with Transformers, that's not quite so much the case, which is curious. And yeah, in general, I think this is
a very good type of paper, which we don't see quite enough in the eye of sort of an empirical study and really an investigation that leads to deeper understanding rather than better performance per se. And I always love seeing these kinds of papers. Right. And I think it's quite interesting to note these very structural differences in the two different architectures, because then
it really just makes you think, okay, these are different models. You know, these are different architectures. They are doing very different things to complete the same task. Because I think sometimes, oh, there's look at this iterative, basically small little contribution that's not really making a drastic change. I think this is very much highlighting. This is a really drastic change in our architecture. It's doing something different when it's trying to learn a visual perception. Yeah.
Yeah, and it's interesting how, you know, so far CNN's in the 90s were very well kind of motivated by human perception when they came out actually in the 80s. Whereas I think Transformers are a bit more of a sort of just discovery that happened. You know, there wasn't it was just sort of an iteration ideas that existed for a while, especially in NLP.
And so I guess there would be some hope that if we do more of these kinds of studies, you know, what is the next transformer? What is the next thing to really be a big discovery in AI? Maybe we can have a more principled way to figure that out and find this next architecture. If you have a deeper understanding on how these ones work and what, you know, possibly are very deficiencies and strengths.
And enough on that. Hopefully we didn't get too technical for any non AI researchers, but you know, it's kind of a fun paper and sometimes we like to geek out.
Next, we have our articles dealing with ethics and society and concerns about AI. And the first one here is titled Flying in the Dark: Hospital AI Tools Aren't Well-Documented. And this is by the Stanford Human-Centered AI Institute. So this is about a new study titled Low Adherence to Existing Model Reporting Guidelines, but Commonly Used Clinical Prediction Models. And so researchers at Stanford
document that, as the title says, that the documentation for a dozen AI models for clinical decision-making for all of them and commercial use and compared to 15 different sets of guidelines, they found that
the guidelines were not followed very well for the documentation. Some cases, you know, more badly than others, but in general, none of them were quite stellar. So interesting for sure. What do you think about this, Sharon?
I think what's interesting is that they were looking at models specifically developed by Epic, which is one of the leading providers of EMRs or electronic medical records. So they're, you know, a multi-billion dollar company and just in in OEMs.
a large percentage of hospital systems, and they are the closest system to essentially to deployment, right, of these models, of these AI models, because they can see all that patient data. They can very much integrate with the workflow of the doctor and administration at the hospital. So I think what's really interesting is that this group of researchers looked specifically at Epic Systems AI models.
Yeah. And I wonder why. Maybe that's just because they're a leading provider and their things are deployed where I think a lot of tools are not and commercially used. And I would presume also these AI models are maybe not quite cutting edge in terms of research.
But still, this is also setting a good precedent for future deployment in terms of people keeping an eye out. And it's interesting that we found that some guidelines were followed, but things were especially weak on documenting evidence that models are fair, reliable, and useful. And we've often discussed that we found evidence that models were biased,
or then were not actually useful or reliable. So that seems like very crucial for deploying new AI tools. And it's interesting that this evidence exists that the combination there is lacking.
Yeah, I think it's really important that they looked at EPICS models because they are closer to deployment, right? And I think they would think about or we would expect them to think about these things a little bit more, these guidelines much more than a research paper, say, written by mainly AI researchers. But I think this illuminates, you know, here are some of the shortcomings we still have. And here are here's what's needed for us to get a little bit more comfortable with deploying them.
That said, I'm very curious if we were to backtrack a couple articles ago, what Singapore thinks about a lot of these things in their robots and whether they use AI and how they've been thinking through some of that, because they seem to have seen a decent amount of success with robots and AI.
Yeah, I think it's always, you know, with AI models, you know, for instance, disease diagnosis or other applications we've seen, you know, there's always you need to take risk into account. So with a lot of robots, aside from DaVinci for surgery, which is very well established,
I think delivery is something where it's a little less high risk and robots are designed to sort of be pretty slow and easy to avoid. Whereas a lot of these AI tools in medicine are maybe somewhere where bias is more likely and you really need to be more careful.
And I found it interesting, you know, reflecting on our origin at Stanford, that this is from the Stanford Human-Centered AI Institute, which kind of came out and it was a little ambiguous why it's needed. But it's interesting to see these sorts of studies coming out of there. And, you know, in some sense, I think,
It demonstrates the usefulness of this institute in terms of focusing more on ensuring that AI is being used in sort of the correct ways out in society, which maybe industry wouldn't be as motivated to do, especially not to sort of overview study. And on to our next article, Toyota pauses Paralympics self-driving buses after one hits visually impaired athlete.
All right. So as the article title suggests, Toyota had the self-driving buses for the Paralympics, but one of them did unfortunately hit one of the athletes who was visually impaired. And Toyota has issued an apology for their overconfidence, quote unquote, of the self-driving bus. And it would definitely significantly
Suspended service temporarily right now, the Japanese athlete who was visually impaired and kind of run over will actually be unable to compete in his his event this weekend. So that that's really, really unfortunate. And this has mainly been showcased for Japanese.
you know, for as part of its as part of Toyota's sponsorship of the Tokyo 2020 Olympics. Yeah, this is, you know, kind of disappointing for sure, because we sort of like, you know, you would think that these are lower stakes and easier cases for self-driving buses because it's really just faring people. But then this happens.
It's interesting to me that Tokyo police said that there were vehicle operators who said that they were aware that a person was there, but thought this person would realize that a bus was coming and stop crossing the street, which is a little strange, I guess. And yeah, luckily the person isn't too injured. I think he's recovering well, but...
kind of a demonstration of how far we are from really using these sorts of things in a, in a truly reliable way. Yeah. And the athlete himself is not actually showing any outrage, which I guess stereotypically an American athlete would. He, his coach actually said, quote, he wanted to take good care of himself. We feel regret, but I think he is the most disappointed. Yeah.
Yeah. So kind of a sad story, but, you know, hopefully it's a good lesson that you need to really be more careful with these sorts of things.
But to lighten the mood, we have our final set of articles in the fun category, starting with our first one from The Guardian. Rise of the Robodrama. Young Vic creates new play using AI. So...
Young Vic is this big theater and it has a new show titled AI that explores how AI can be used for a play. And it's sort of weird. It's not really a play per se. It's this interesting format where you see the production staff, the writers and producers of the play sort of
interacting with GPT-3 in real time in front of the audience and trying to write a play interactively. So it's, you know, you see it coming together live and then it's the end of the evening they perform this brand new play that they developed using GPT-3.
So not what I would have expected. And I don't know, it sounds pretty intriguing to me. I think I would have wanted to see it. What do you think, Sharon?
Yeah, that sounds really fun to watch and reminds me of, you know, a lot of people have been using GPT-3 for great artistic applications such as this one. It reminds me of I met one of the Yes Theory guys and he told me that they use GPT-3 to determine what their next adventure should be and basically narrate their next adventure. Yeah, and it's interesting. I think this is
I was a little skeptical when I saw the headline, expecting sort of some pretty lame thing, but I think this is sort of precisely...
a very smart and appropriate use of GPT-3 where it's this back and forth and they get these raw ideas from GPT-3, but then ultimately it is the people that mold it and combine it and create a play. So they sort of get ideas and brainstorm these high level things. And GPT-3 did offer some fun things about
you know, a star crossed love and, uh, different sort of apocalyptic things. And then apparently, you know, the final play was, um, a play about a great collision in which humans are now beast men who have a passing resemblance to Morlocks in the time machine. Um,
So yeah, I think this is kind of a good example of how to do it right. And it's interesting that the play also kind of was there to show the audience how to interact with AI and what AI can do and where the human element is needed. And onto our last article, Astro B will find astronauts lost socks.
All right, so at some point in the future, NASA wants to build this permanent space station called Gateway, which will be in orbit around the moon. And they expect Gateway to be empty a lot of the time, but they want it to be a welcoming space when astronauts do arrive. And
They have a project for autonomous systems already on board space stations to work with autonomous or semi-autonomous robots and manage any situations that require physical intervention. And one such robot is the Astro B robot.
Um, and Astro B robot, um, during their, uh, I guess current experimentation, um, it's this small cube shaped robot. It looks, it looks pretty cute and it looks pretty futuristic actually. Uh, so I do encourage you to go check out, uh, what it looks like visually. Um, but it's, uh,
Basically, one of its jobs is to navigate the station and look for any vents used for cabin air circulation. And it uses computer vision to automatically detect anything that might be blocking the vent and
In the example here, it's an astronaut sock. And this is actually represented by a printed image of a sock, not an actual sock, to test the computer vision algorithm. But it looks pretty sci-fi in the future. Yeah, for sure. It looks pretty sci-fi. And I think Radical also notes that
uh, you know, these ones are little cubes, so they can't really, uh, you know, use arms for stuff per se, but there'll be also other robots that NASA has developed that could maybe do that instead. So it's, it is pretty sci-fi to think, you know, there will be a bunch of autonomous robots or semi-autonomous robots on the space station while the humans are away doing the stuff that's needed to keep it running. And, uh,
Yeah, that's pretty crazy and certainly exciting. And I guess I didn't realize, you know, we had these ambitions of this whatever space station in orbit around the moon. You know, that's pretty, pretty interesting. And I wonder what is the timeline for all of this?
That may be hard to tell you. They can probably tell you the intended timeline, but exact timelines are always unknown until they actually happen. But yeah, these robots are a little bit cute. They're in a compact form factor and they look like they have these two little eyes and they're just cubes. So they have a little bit of a WALL-E kind of aesthetic, I guess. They do. Yeah, yeah, yeah.
And with that, that's it for us this episode. If you've enjoyed our discussions of these stories, be sure to share and review the podcast. We'd appreciate it a ton.