We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

AI Fails to Read PDFs, OpenAI Jukebox, and more!

2020/5/9

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Andrey Kurenkov

Sharon Zhou

Topics

Andrey Kurenkov: 本周讨论的新闻涵盖了AI在多个领域的应用，包括COVID-19研究、医疗诊断和音乐生成。在COVID-19研究中，AI面临着处理PDF文档的挑战，因为PDF格式的多样性和缺乏标准化使得AI难以提取有效信息。在医疗领域，Google开发的AI眼疾检测系统在实际临床应用中效果不佳，这凸显了AI模型在不同环境下的泛化能力问题。此外，Andrey还讨论了OpenAI发布的Jukebox模型，该模型能够生成不同类型和风格的音乐，展现了AI在音乐创作领域的潜力，但也存在一些不足，例如音质和歌词方面的问题。最后，Andrey还讨论了Moxie这款面向儿童的社交机器人，旨在帮助儿童发展社会情感和认知能力，但其市场接受度仍存在不确定性。 Sharon Zhou: Sharon Zhou在节目中主要讨论了AI技术在处理PDF文档和医疗数据方面的局限性，以及新型AI音乐和社交机器人的出现。关于PDF文档，Sharon指出，尽管AI技术发展迅速，但其在处理PDF文档，特别是理解图表和语义布局方面仍然面临挑战。在医疗领域，Sharon强调了现有的AI临床应用规范主要关注准确性，而忽略了AI能否改善患者预后的重要性。此外，Sharon还对OpenAI的Jukebox模型和Moxie社交机器人发表了自己的看法，她认为Jukebox生成的音乐在旋律和音调上能够体现不同音乐类型和艺术家风格的特点，但其歌词和演唱方式仍不够自然。对于Moxie机器人，Sharon认为其瞄准了儿童社会情感技能发展这一市场需求，但其市场接受度仍存在不确定性，其成功与否取决于能否说服家长，使其相信该产品能够减少孩子使用屏幕的时间。

Deep Dive

Chapters

The CORD-19 initiative aims to use AI to summarize COVID-19 research, but faces challenges due to the PDF format and language diversity.

Shownotes Transcript

Translations:

中文

Hello and welcome to SkyNet Today's Let's Talk AI podcast where you can hear from AI researchers about what's actually going on with AI and what is just clickbait headline. This week we'll look at yet more applications of AI for COVID-19 and then discuss a smattering of fun recent developments in AI.

I'm Andrey Kernikov, a third-year PhD student at the Stanford Vision and Learning Lab. I focus mostly on learning algorithms for robotic manipulation in my research. And with me is my co-host. I'm Sharon, a third-year PhD student in the machine learning group working with Andrew Ng. I do research on generative models, improving generalization of neural networks, and applying machine learning to tackling the climate crisis. And

And I hope you're doing well, Sharon. We can go ahead and dive right in to this week's news stories that we are going to discuss. And the first one here is from Rollcall.com and it's titled AI Researchers Seeking COVID-19 Answers Face Hurdles.

So this article is all about an effort called CORD-19, which was announced on March 16th at the White House Office of Science and Technology and was basically about a joint initiative between multiple large companies for coming through with scientific literature of COVID-19. So there's now already

tens of thousands of papers and to kind of get through it all and summarize it, whereas actually attempts to use AI and machine learning to sift through it and kind of understand it. And what this article says in a nutshell is that it is turning out to be a little bit tricky, in particular because all these papers are in the PDF format, which is easy for humans to read, but not so easy for machines to read.

to read. So kind of an interesting development, I suppose, and maybe for any non-researchers listening.

Might be fun to learn that a lot of AI research or research in general is pretty much just reading and writing PDFs. So, yeah, I don't know. Sharon, is this surprising to you that that's turning out to be a hurdle? It's not surprising, but I also want to note that while AI touts to be able to do all of these things, we still can't get it to read PDFs for us.

And I find that pretty funny that PDFs are still quite a big challenge, maybe less so the individual words, the text, but graphs and understanding the layout of things semantically and what things referred to. That is very, very challenging still, especially when the format is so diverse everywhere. And so humans are just so...

easily able to adapt to reading varying PDFs and understanding what's being communicated.

Yeah. This article notes that the effort began with 29,000 papers. Now it's more than 50,000 and PDF, I guess one of our problems is there's no real standardization. So it's not like you have a nice little tag for every section that is the same across all papers, or you have the same tags for images.

or even the format or how things are laid out is the same. So there's a huge amount of variety and humans naturally are able to deal with that variety, but AI and machine learning, not so much, not without a big effort.

The article also notes that Kaggle, which brings together more than a million data scientists from around the world, is holding a competition to generate algorithms that would extract this information and findings from these articles to answer questions such as the incubation period for COVID-19 observed from around the world as well as others. So they would then feed that to biomedical researchers who in turn would provide feedback back to the data scientists on further questions.

Yes, and it also notes that the repository has articles primarily from the US, the UK, the EU, less so from China, where, of course, there have been also thousands of papers. And so there aren't many Chinese language papers, which is just yet another...

to common approval literature. Now you have multiple languages and presumably also different formats. And also to note that the database is also likely missing publications by government agencies. Yeah, so I think it's an interesting look at just how challenging and how massive this entire effort is just being

tens of thousands of papers and to even get through it we are trying to use algorithms but as you said modern ai is kind of limited and it's not so easy to apply it to various things that humans find kind of easy it takes a lot of effort speaking of limitations of ai in the wild our next article is google's medical ai was super accurate in a lab real life was a different story

So this was published in Technology Review, and it was a study from Google Health where, quote, it was the first to look at the impact of a deep learning tool in real clinical settings and illustrates that if it's not tailored to a particular clinical environment, AI can make things worse rather than better. Right.

And I will note that it probably is not. And I know it's not actually the first look at the impact of deep learning in real clinical settings, but it is a powerful one.

And so Google deployed a deep learning system trained to spot signs of eye disease in patients with diabetes in about 11 clinics across Thailand. And while Google's lab numbers report an impressive 90 percent accuracy and 10 minute turnaround time for results, the system didn't really work so well in practice.

And this was because the system was trained on very high quality eye images. So it didn't work well with the cell phone images that nurses were sending in. And this caused about a fifth of the images to be rejected by the system without any results, forcing the patients to inconvenience themselves and go in into the clinic.

for second exams. And the article states, quote, because the system had to upload images to the cloud for processing, poor internet connections in several clinics also caused delays. The Google Health team is now working with local medical staff to design new workflows. And so this is really interesting that when we do develop an algorithm, we often are a

hold up in our own data set and our own methods and think that that might generalize beyond that data set. But of course, that's not always the case in the real world when we do deploy something in a setting that would actually be useful to those people. So I think in research, sometimes we're not thinking about stakeholders more upfront enough. And so this is perhaps one example of that. Yeah.

Yeah, I agree. I think over the last year, year and change, we've had the new human-centered AI institute at Stanford that has really been pushing this idea that we need to be more interdisciplinary of AI as AI becomes more of a tool in many domains. You can't just have AI researchers building stuff out and expect it to work when you hand it off to an economist or a journalist or

or in this case, a medical practitioner, right? You need to interact with those disciplines. As you all know, Sharon, we've talked about your work with climate change researchers. And you, yeah, you have to go there. You have to actually see how the workflow is and understand it before you're able to create a tool. So this is a really great, I think, illustration of that.

I think even beyond the interdisciplinary part that this is stressing is that so Google does have people in-house who are doctors and who know the clinical setting quite well, but perhaps only in the U.S. system or perhaps European system.

But they, I think, were not thinking as strongly about who they would actually hand this technology off to, where this technology would be the most useful. And it sounds like Thailand may have been that place, but that they didn't think about the quality of the images necessarily. And

We've definitely run into this in the lab and thought about doing x-rays of varying quality to be able to handle them. As I think we've heard about this work before this article came out, but it definitely is a huge issue in medicine. And the article also states that

existing rules for deploying AI in clinical settings, such as the standards for FDA clearance in the US or a CE mark in Europe, they focus primarily on accuracy and that there are no explicit requirements that an AI must improve the outcome for patients, largely because such trials have not yet been run.

And so this is extremely key because clinical trials, what you need to show every single time is to improve patient outcome. And it's very hard to define and justify that. But you must do that for it to pass clinical trial. Yeah. And to that point of accuracy being very,

the only metric currently specified as important, we often have seen articles such as, you know, this new AI tool is now as good as radiologists at spawning cancer or something like that. And so far I would say the, the important thing is to be a little bit skeptical of such headlines, because as we see here, it's not just about some sort of quantitative, uh,

metric of accuracy, you need to actually see this tool deployed and being used in the real world to really believe and prove that it functions well and actually assist patients. So it's a great reminder of that, I think.

But let's not be quite so dour. Let's move on to some kind of fun articles we can discuss. And the first one here from The Verge is titled OpenAI introduces Jukebox, a new AI model that generates genre-specific music. So it's all about how on April 30th, OpenAI released, announced a new generative model called Jukebox, which is a neural net that generates music and includes rudimentary singing,

as raw audio in a variety of genres and artist style. And so according to OpenAI, when you provide this model of genre, artist, and lyrics that you condition on, Jukebox will output raw music that it generates just within its neural net

algorithm. So, if you listen to this thing, and I guess we'll try to splice in some sounds now if we can. ... ...

It's pretty impressive that we have gone to this point already with AI where we have sort of music that you can tell has some of the traditional hallmarks of the genres and you can hear some of the singing and lyrics as it's planned to. But of course, it still sounds kind of weird. And on top of that, there's been some criticism that OpenAI just scraped

huge amounts of music of raw songs to train Vignola without really asking for permission. Sharon, we listened to a few songs just before starting this recording. So I'm curious what your feelings were on this model and its music.

Yeah, so the melody and general tone was definitely down for the different genres of music. Sometimes you could pick up on the actual artist and their general style. I think what was interesting was at times it sounded like

Uh, the singing was in a different language. And especially when you looked at the lyrics, it didn't sound very human at all. Uh, but you could tell it was some kind of singing. Uh, so that was really interesting. Um, and I could see that improving over time. Um, so I think it was still a very cool piece of work and that they showed so many different samples. I think the part about rights and everything is, is definitely an issue. Um,

yeah. And I, even with ImageNet and CIFAR 10 and everything, um, which were, are the, um, very, uh, based data sets for, uh, for, for computer vision, uh,

They did not ask for permission necessarily to be scraping them. So I think generally in research, we should probably think a little bit more about this kind of stuff, especially here where there is clear licensing involved in music.

Yeah, exactly. There was actually a recent news article we didn't talk about, but that's quite related about how Jay-Z or someone representing Jay-Z issued a copyright strike on a YouTube video that used similar neural net models to generate kind of music in the style of Jay-Z, but with, you know, made up lyrics or made it sound like Jay-Z was saying something that he wasn't.

So that is kind of showing that artists, as these kind of models grow, may feel different ways about having kind of their voice or sound-alikes generated by AI, which is, of course, pretty weird.

One note about the lyrics, which are quite inhuman, is that actually the lyric conditioning was done via an AI model working with the researchers. The researchers basically looked at different options the AI provided them and picked out what looked nice to them.

So it wasn't kind of like the model generating everything from scratch. You actually had a separate module for the lyrics, which was text-based and totally separate. But still, if you compare this to prior work...

as an engineering achievement and in terms of the combined creation of singing and music in different styles, it's definitely kind of pushing what has been done so far with Neural Nets.

Yes, yes. And there are certainly some limitations that were obvious as we were going through, which sometimes the model was generating lyrics alongside a human helping out or open AI researcher. Also, the article notes that, for example, while the generated songs show local musical coherence,

follow traditional chord patterns and can even feature impressive solos, we do not hear familiar larger musical structures such as choruses that repeat. So that still does not quite happen yet. And that kind of

structure may require some kind of memory, which neural networks have had a great challenge of overcoming and trying to manage. So it makes sense. Yeah. And to be a little more technical, the approach is based on taking a song, encoding it, kind of splicing it up and then kind of reshuffling it in a way. So it's also not sort of completely independent of anything else.

I'm kind of curious, Sharon, you have worked some with generative adversarial networks, which are related in the sense of they create images, condition on some input, and are also in some way kind of can be seen as creating new art. So yeah, how have you found working with GANs? Do you just stare at their outputs all day and it drives you crazy? Or what has that experience been like?

So to be clear, GANs can produce any type of output, including audio. So they are able to do things like this as well. And I believe DeepMind has work on this.

and their VAEs or generative models very broadly can produce images, text, hear audio, music, even human speech, like you would expect the Alexa or Google Home or Siri to be producing realistically. And I would say that...

going through samples is, is fairly, uh, it's time consuming. It's also, I guess it becomes just the way you evaluate these networks. Uh, a lot of it is qualitative, um,

because we don't have benchmarks to necessarily work with or metrics that we can easily evaluate these models with. I did put out work that tried to do this in a crowdsourcing manner, but as you train your model team,

you're probably as a researcher going to be looking through samples to be debugging your model and to understand how to improve it. And I can imagine it could be quite challenging listening to things in this case, because for image samples, you can look at several samples at a time very easily as a person. But here you would have to listen to things sequentially to get a sense of what's going on.

Interesting. Yeah, I've seen quite a lot of GAN outputs, as you say, that can be images or something else. But the images are perhaps more advanced. They've had quite a bit of work done on them.

And one thing I've noticed is that as you see more and more of these AI outputs, you realize that kind of the default, the average output kind of gets boring pretty quickly because you sort of figure out, oh, this is the sort of thing that it does. And it's not necessarily interesting after, you know, the first few novel experiences. And in the realm of images and art, it's actually been interesting to see there is now probably a dozen, maybe

half a dozen, a dozen really active artists using GANs and

And some of them have said that, you know, it's not just about having the algorithm, it's really about how you use it to make things interesting, because by default you get really generic things. So I suppose we can hope for something similar in this realm of music and audio that ultimately will become tools for artists to make more interesting things. There actually have been a few of these, but as we make more progress, maybe it'll become more democratized and more can play with it.

All right. So that was a cool new method from the AI world. And now if we shift a bit to robotics, our next article is Meet Moxie, a social robot that helps kids with social emotional learning.

So the social robotics startup Embodied is launching a new robot called Moxie, a social companion aimed at kids around ages six to nine. And Moxie is designed to help promote social, emotional and cognitive development, the article says, through everyday play based learning and captivating content.

And so to be a little bit more specific, the goal is that through daily interactions, perhaps even just a few minutes at a time, Moxie will help children develop social and emotional skills. And Abodied was founded by Paolo Pergi...

Perjanian, whom we first met back in 2010 when he checked out the mint floor cleaning robot that he developed as the CEO of Evolution Robotics. And Evolution was acquired by iRobot. And iRobot made the Roomba in 2012, turning mint into the iRobot Brava and Perjanian into iRobot's CTO.

He left iRobot in 2015 and founded Embodied the following year. So as someone who does research in robotics, what are your thoughts here, Andre? Yeah, it's quite interesting. The article notes that...

We've discussed this also a little while back, that there has been a kind of wave of these kinds of social home robots that did not fare particularly well this past decade. So there were like three big high profile companies producing these sorts of little robots that could stand on your desk and talk to you and kind of emote things.

And they all sort of did not succeed, partially because it was very expensive, partially because Alexa and similar things came around and were much cheaper and for various reasons.

This looks kind of interesting because it's trying to be a little more focused. It's not just an Alexa alternative that is more emotive. It's actually very, very specifically meant to help kids develop. And I like a lot how it looks. I think it looks very well designed and...

Probably well engineered. I do wonder, I guess, if this is really something most kids would need in addition to interacting with other kids and their parents, if this is really that beneficial for kids who don't have something like autism or similar solutions have proven quite useful.

But personally, I would love to just play with it, see how it works. And if the research is there, I think it's a very cool idea. How do you, what do you think about this, Sharon? I think on the one hand, it's a really interesting target market to go after, especially as a

especially as there's more and more research being done on how kids are not learning these social emotional skills as much as they spend more time on social media and perhaps are not learning these skills as much. So that definitely is an area that I think parents at least are thinking about for their kids as they grow.

Um, I think definitely in the autism space, I've, I've heard stories of, um, an autistic child, for example, getting really close with Siri and Siri being a really great friend to this child. Um, so perhaps it will trend in that direction. Uh, however, I don't see much else besides that. And it's not clear to me how much, uh,

this market will be very receptive to a robot.

as the way to solve this problem, especially as parents are trying to decrease screen time. So as long as they can argue that this is decreasing screen time, even though it's a screen, it's instead focused on emotive behaviors, then perhaps parents will be more interested in getting something like this. And I really think this would be like the parents buying stuff like this as the market. Yeah.

Yeah, I agree. It's interesting to see if this can work. The article does note that Embodied says it has strained the robot on conversations that it has gotten from kids by working with about 100 families for more than a year. And that testing allowed the company to identify certain common themes like school, friends, bullying doctors, and so on.

Maybe the idea is that this can be a tool for parents, you know, something that they can use to try and track the emotional well-being of their child. Of course, children can sometimes have a tough time being completely open with their parents. So maybe this is a nice kind of aid for that.

I think it's interesting. It's easy to see as a sort of dystopian, you know, all we now need robots to raise our children. But since it is aimed to be obviously with a combination with the parents and friends and normal human interaction, I think it could be quite useful if done right.

And with that, thank you so much for listening to this week's episode of Skynet Today's Let's Talk AI podcast. You can find the articles we discussed here today and subscribe to our weekly newsletter with similar ones at skynettoday.com. Subscribe to us wherever you get your podcasts and don't forget to leave us a rating if you like the show. Be sure to tune in next week.

AI Fails to Read PDFs, OpenAI Jukebox, and more! 25:46 Share

Last Week in AI

Deep Dive

Shownotes Transcript

AI Fails to Read PDFs, OpenAI Jukebox, and more!