We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode AI can't read the room

AI can't read the room

2025/4/28
logo of podcast Marketplace All-in-One

Marketplace All-in-One

AI Deep Dive AI Chapters Transcript
People
L
Leyla Isik
Topics
Leyla Isik: 我和我的研究团队使用简短的视频(例如两人聊天、两个婴儿玩耍、两人进行同步滑冰表演)进行研究。我们将这些视频给人类参与者观看,并询问他们诸如‘这些人是否在互相交流?’、‘互动是积极的还是消极的?’之类的问题。然后,我们将相同的视频提供给超过350个开源AI模型。结果发现,AI模型在理解视频内容方面远不如人类。我们发现,实际上没有任何模型能够很好地将行为或大脑反应与不同的社会属性(例如,人们是否在交流)匹配。令人惊讶的是,它们甚至无法很好地区分人们是否面对面。虽然我们预料到AI在某些方面会存在不足,但其整体表现之差还是让我们感到非常惊讶。我们测试了350个模型,有些模型的表现比其他模型更好,但这为我们提供了有趣的见解。但是,没有一个模型能够完全匹配我们测试的所有人类行为。AI在理解人类行为方面还有很长的路要走,尤其是在需要理解人类意图和预测其行为的场景中,例如自动驾驶汽车的左转。即使是判断人们是否面对面这样最基本的事情,AI也做得不好。这表明,在AI与人类互动方面,还有很多工作要做,我们需要改进这些系统,并找到新的方法来对这些系统进行压力测试。 即使在最基本的行为理解方面,AI也存在不足,例如判断人物位置和关系。虽然AI在过去十年中取得了令人惊叹的进步,但我认为,解决这些问题可能需要从根本上改变方法,而不仅仅是增加数据和扩大网络规模。目前许多AI客服应用是基于文本的,但如果要扩展到更广泛的应用,例如辅助机器人,就需要AI能够基于视觉线索与人类互动。历史上,AI在很大程度上受到了人类、认知科学和神经科学的启发。但在最近的AI热潮中,这三个领域似乎有些脱节。我认为,现在是时候让这些领域重新走到一起了,我们需要将人类关心的因素和我们赋予世界的结构融入AI模型的设计中。 Stephanie Hughes: 在节目的访谈中,我与Leyla Isik教授讨论了这项研究的发现以及其对AI商业应用的意义。

Deep Dive

Shownotes Transcript

Translations:
中文

Artificially intelligent? Yes. Socially? Awkward. From American Public Media, this is Marketplace Tech. I'm Stephanie Hughes. I met up last week with Leila Ishak. She's a professor of cognitive science at Johns Hopkins University.

I went to visit her lab, where she's got some watercolors of brains on the wall. Good brain art is important. Ishik's the senior scientist on a new study looking at how good AI is at reading social cues. She and her research team took short videos. Sometimes it's like two people chatting and dancing, two guys doing this like synchronized skate routine. They showed the videos to human participants and asked them questions like, "Are these people communicating with each other?"

They also gave the same videos to over 350 open source AI models. That's a lot, though it didn't include all of the latest and greatest. And Ishak found that the AI models were a lot worse than humans at understanding what was going on.

One thing we found was that actually none of the models could do a good job of matching behavior or brain responses to these different social attributes. Like, are people communicating? Surprisingly, none of them could even do a great job at telling us things like, are these people facing each other? So I think we had a feeling that there would be elements of this that the AI could not capture, but we were pretty surprised by, in general, the poor performance. Exactly how bad it was. Yeah.

And so basically the AIs across the board couldn't tell if people were communicating, if they were facing each other. There was some variety. So like I said, we tested 350 models. Some models were better at it than others, which yielded some interesting insights. But no single model could provide a match to all the human behaviors we tested.

Why does this matter? Like, why would it be helpful for AI to be good at this? Yeah, well, I think anytime you want to have AI interacting with humans, you want to know what those humans are doing, what they're doing with each other, what they're about to do next. And I think this just really highlights how far a lot of these systems are from being able to do that.

What do your findings mean for possible business applications for artificial intelligence? Yeah, I think the businesses where this is probably most close to being applied or currently being applied are things like self-driving cars. People, the drivers, have this intentionality and the pedestrians, and you have to be able to understand that. For example, I think it's very hard for self-driving cars to make an unprotected left turn.

It's hard for humans too. It's hard for humans too sometimes. And when you do that, you have to like really look around and think about who's doing what next and those sorts of things. And I think this just highlights how much more work needs to be done, both in the development of these systems to improve them, but I also think it highlights some new ways to be stress testing these systems against humans. We'll be right back. You're listening to Marketplace Tech. I'm Stephanie Hughes.

We're back with Leila Ishak, professor of cognitive science at Johns Hopkins University. I think some people envision this future where we all work alongside our AI colleagues or buddies, you know. And I wonder, what do your findings mean for the short term, at least, about AI's ability to do that? Like, will it be the Michael Scott in the office? Yeah.

Perhaps, but I think there are some even more baseline findings than baseline problems than that. Like I said, you want it to be able to tell what the person is doing, what it's close to, who's close to who, and even those more basic things than just reading intentions that seems to be lacking in as well. I feel like I'm a grown-up and I'm still learning how to pick up on social media. It's like a lifelong process. Do you think the AI will get there?

Yeah, you mentioned you're a grown-up. I mean, I think there are, it's really striking how much of this even little babies can do, though. And so not to the full sophisticated level that, you know, we keep developing and refining as throughout development and throughout our lifetime, but there are some base abilities that seem to be present from at least very early in childhood. And

I think AI should be able to get there. And I think the progress AI has been making over the last decade or so has been really amazing. But I think that some of these problems might require a fundamentally different approach than sort of the brute force, just get more data and bigger networks, solutions that have been taking us pretty far. But I think there might be limits to that. Another place that AI is being used is in customer service. I wonder how your findings...

what your findings can mean for customer service and then AI's use in that. Yeah, I mean, so I think right now a lot of those applications are all text-based like chatbot type things, but if you really wanted to scale that up, you would want, or any sort of

assistive robots, AI, you would want them to be able to interact with people based on visual cues. We use the visual cues all the time to interact with each other. So I think that this has important implications anytime you want an AI to be interacting with humans. Do you have any advice for AI makers? Yeah. I mean, I think historically and still loosely to some extent, AI has drawn a lot of inspiration from humans, from cognitive science, from neuroscience.

And I think in the latest AI boom that those two fields have sort of diverged. But I think this is an important point to start coming back together and where the things we know that humans care about and the sort of structure we imbue on the world can help improve these AI models. That's Leila Ishak, professor of cognitive science at Johns Hopkins University. We've got a link to her study on our website, marketplace.org. And we've got a picture of her brain arc, too.

Daniel Shin produced this episode. I'm Stephanie Hughes, and that's Marketplace Tech. This is APM. If there's one thing we know about social media, it's that misinformation is everywhere, especially when it comes to personal finance. Financially Inclined from Marketplace is a podcast you can trust to help you get serious about your money so you can build a life you've always dreamed of.

I'm the host, Janelia Espinal, and each week I ask experts important money questions, like how to negotiate job offers, how to choose a college that you can afford, and how to talk about money with friends and family. Listen to Financially Inclined wherever you get your podcasts.