cover of episode 98: Helping computers decode sentences - Interview with Emily M. Bender

98: Helping computers decode sentences - Interview with Emily M. Bender

2024/11/22
logo of podcast Lingthusiasm - A podcast that's enthusiastic about linguistics

Lingthusiasm - A podcast that's enthusiastic about linguistics

AI Deep Dive AI Chapters Transcript
People
E
Emily M. Bender
L
Lauren Gawne
Topics
Emily M. Bender: 我从事计算语言学研究,专注于多语言语法工程和语言技术的社会影响。我开发了一种语法矩阵,可以帮助人们更快地为不同的语言构建语法。我认为,要让计算机理解语言,需要解决人工智能领域的所有问题,这是一个非常复杂且不明确的目标。目前,计算机处理语言的方式与人类不同,人类学习新词是将词与现实世界中的概念联系起来,而计算机学习新词是建立词与其他词之间的联系。基于规则的计算语言学和基于统计的计算语言学是两种不同的方法,前者需要人工编写规则,后者则依靠统计模型来处理语言。基于规则的语法系统允许我们追踪错误并进行调试,而统计模型则是一个黑盒,难以进行调试。大型语言模型容易产生偏见,因为它们是从互联网上收集的数据进行训练的。大型语言模型生成的合成文本正在污染互联网数据,这不利于语言学研究。大型语言模型并非真正理解语言,它们只是根据统计概率生成文本,因此其输出结果可能不准确或包含偏见。“幻觉”这个词不适合用来描述大型语言模型的错误输出,因为它暗示模型具有感知能力。大型语言模型存在数据问题、计算成本高昂以及劳动剥削等问题。基于符号的语法处理工作仍然有其价值,尤其是在需要精确答案的场景中。语言学可以帮助我们更好地设计自然语言处理系统,使其成为有用的工具,而不是误导性的工具。语言学能够帮助我们深入研究语言的结构,从而更好地理解语言在世界中的作用。 Lauren Gawne: 我与Emily M. Bender教授讨论了计算机处理语言的方式,以及规则与统计模型在计算语言学中的应用。我们探讨了大型语言模型的局限性,以及在构建这些模型时需要考虑的伦理问题,例如数据偏差、劳动剥削和环境影响。我们还讨论了如何改进这些模型,使其更准确、更公平,以及如何将语言学知识应用于自然语言处理领域。

Deep Dive

Chapters
This chapter covers Lingthusiasm's 8th anniversary, the listener survey with linguistics experiments and advice questions, and details on accessing bonus episodes and Patreon gift memberships.
  • Lingthusiasm's 8th anniversary celebration
  • Listener survey with linguistics experiments and advice questions
  • Bonus episodes and Patreon gift memberships

Shownotes Transcript

When a human learns a new word, we're learning to attach that word to a set of concepts in the real world. When a computer "learns" a new word, it is creating some associations between that word and other words it has seen before, which can sometimes give it the appearance of understanding, but it doesn't have that real-world grounding, which can sometimes lead to spectacular failures: hilariously implausible from a human perspective, just as plausible from the computer's.

In this episode, your host Lauren Gawne gets enthusiastic about how computers process language with Dr. Emily M. Bender, who is a linguistics professor at the University of Washington, USA, and cohost of the podcast Mystery AI Hype Theater 3000. We talk about Emily's work trying to formulate a list of rules that a computer can use to generate grammatical sentences in a language, the differences between that and training a computer to generate sentences using the statistical likelihood of what comes next based on all the other sentences, and the further differences between both those things and how humans map language onto the real world. We also talk about paying attention to communities not just data, the labour practices behind large language models, and how Emily's persistent questions led to the creation of the Bender Rule (always state the language you're working on, even if it's English).

Announcements: The 2024 Lingthusiasm Listener Survey is here! It’s a mix of questions about who you are as our listener, as well as some fun linguistics experiments for you to participate in. If you have taken the survey in previous years, there are new questions, so you can participate again this year. Take the survey here: bit.ly/lingthusiasmsurvey24

In this month’s bonus episode we get enthusiastic about three places where we can learn things about linguistics!! We talk about two linguistically interesting museums that Gretchen recently visited: the Estonian National Museum, as well as Mundolingua, a general linguistics museum in Paris. We also talk about Lauren's dream linguistics travel destination: Martha's Vineyard.

Join us on Patreon now to get access to this and 90+ other bonus episodes. You’ll also get access to the Lingthusiasm Discord server where you can chat with other language nerds. Sign up here: patreon.com/posts/115117867

Also, Patreon now has gift memberships! If you'd like to get a gift subscription to Lingthusiasm bonus episodes for someone you know, or if you want to suggest them as a gift for yourself, here's how to gift a membership: patreon.com/lingthusiasm/gift

For links to things mentioned in this episode: lingthusiasm.com/post/767803572750581760/lingthusiasm-episode-98-helping-computers-decode