We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode AI and the future of dictionaries, with Erin McKean

AI and the future of dictionaries, with Erin McKean

2025/4/17
logo of podcast Grammar Girl Quick and Dirty Tips for Better Writing

Grammar Girl Quick and Dirty Tips for Better Writing

AI Deep Dive AI Chapters Transcript
People
E
Erin McKean
Topics
Erin McKean: 我对人工智能持怀疑态度,我认为'人工智能'这个词用词不当,更准确的叫法是'模仿智能',因为大型语言模型是基于英语的统计模式,而单词的含义取决于语境。我们进行了一项研究,测试了大型语言模型在字典任务中的表现,结果显示其表现平平,但该领域的研究非常活跃。大型语言模型的训练成本非常高昂,这使得雇佣大量语言学家可能更经济实惠。大型语言模型会产生'幻觉',因为它本质上预测下一个最可能的单词,但这并不一定是一个真实的或有意义的单词。人工智能的应用会改变现状,但其成本和投资回报可能不如人们预期的那样高,因为语言学家的成本远低于人工智能工程师。要使大型语言模型在特定任务上表现出色,通常需要进行专门的训练和微调。大型语言模型的训练依赖于海量文本数据,而这与语言学家和计算语言学家所依赖的数据相同,这可能会对学术研究造成影响。如果所有文本数据不再被视为学术研究的公平游戏,那么计算语言学和词典学领域的研究进展将会停滞。目前难以区分互联网上的文本是人类创作还是大型语言模型生成的,这会影响语言学家的分析和研究结果。大型语言模型不擅长一些词典学任务,例如按字母顺序排列单词、识别语料库中未包含的单词以及生成国际音标发音。大型语言模型擅长将成人水平的定义改写成更低的阅读水平。现在成为一名语言学家的机会比以前少了很多。语言学家工作的减少并非完全由人工智能造成,互联网和印刷媒体的衰落也起到了作用。互联网对词典的商业模式造成了冲击,但同时也使像Wordnik这样的在线词典成为可能。我目前将Wordnik视为一个耗费大量个人时间的爱好,希望未来它能够成为一个能够维持生计的项目。

Deep Dive

Shownotes Transcript

Translations:
中文

Grammar Girl here. I'm Mignon Fogarty. And for the next few weeks, while we're taking a season break from interviews, we're going to release some of the best of the best bonus episodes that people who support the show through Grammarpalooza got during the regular season. This week, you're getting behind the scenes conversation with Erin McKean, a lexicographer who runs the online dictionary Wordnik almost all by herself. And we're talking about how

AI is affecting dictionaries. She even did a study to see which dictionary tasks AI can do and which it can't. So we do these kind of extras every time I do an interview. So almost every week. And thank you to the current Grammarpalooza subscribers who support the show. You make this all possible and we really appreciate it. And if you're listening or watching and you would like to be a Grammarpalooza supporter and support the show,

We would also appreciate that. And you can sign up right on Apple Podcasts on the Grammar Girl show page. There's also a way you can get everything by text message. So to learn more about that, go to quickanddirtytips.com slash bonus. And you'll find links to both those options in the show notes.

If you want to help, you can sign up on the show page at Apple Podcasts, right there on Apple Podcasts, or you can get everything by text message. And you can learn more about that option by going to quickanddirtytips.com slash bonus. And both those links will be in the show notes.

Erin, thank you so much for being here. Oh, thanks so much for having me. Yeah. So one of the things that came up really briefly in the main segment was you mentioned AI in dictionaries. And I am really curious what you think. I mean, you know, recently, dictionary.com laid off all AI.

of their lexicographers. And although they didn't say it was because of AI, a lot of people have been speculating it's because of AI. And so I, I just, I just wonder what your thoughts are. And also because you're very technologically savvy, you do a lot of the tech work on your dictionary, you use APIs, you know? So like, what are your thoughts on this technological thing that seems to be coming for dictionaries? So, um,

I'm a little bit of an AI skeptic, but I, I'm, I'm kind of hurt by it because, so I think artificial intelligence is kind of a, a misnomer. And in fact, I saw somebody online who said we should call it imitation intelligence. And I think that's way better, right? Because,

All of these large language models are based on statistical patterns of English, right? So theoretically, English is just the statistical patterns of how we use words. Words really only mean things in context, which means that if you have a word in a sentence in one place and it's preceded and followed by the same words in another sentence, you can

pretty much feel like they're going to have similar meanings because they have similar contexts. Like my favorite example of like meaning depending on context is if you say the word toast to somebody, they don't know whether they're going to get champagne or a piece of bread with jam on it until you say more things. Right. And so I really thought that, and I still maybe think that there are some dictionary tasks that

that these models would be good for. And so actually, Will Fitzgerald, who used to work at WordDict with me and I, we did a paper for AsiaLex about what's the return on investment of using a large language model for some of these dictionary tasks. So we used just straight out of the box, chat GPT, and we ran it through some tasks. And it kind of...

Meh, right? But this is a really active field of research right now. There's been papers at Euralex. I'm sure that at the next dictionary society meeting, there are going to be more papers because people want to believe. Now, the thing is, I think that, and this is just me speaking in my personal capacity because I do have a day job where I work at Google, but I do not work on anything AI related at Google. I work in the open source programs office. Okay. A lot of these models...

are maybe not as cheap as we think they are. There's an environmental cost. They're cheap now kind of as a loss leader, but they cost an enormous amount to train. Some of these models, it's been estimated they cost like a trillion dollars to train.

I don't know about you, but I think I could hire a lot of lexicographers for a trillion dollars. Yeah, that's mind boggling. Yeah. And so the reason that that investment is considered to be worthwhile is that they think they're going to be general purpose, right? It's generative AI. You can generate anything you want. But is it in fact going to work? Like we've all seen these hallucinations that come from these models, right?

And when you think about it, what they're really giving you is what they think the next most probable word is. Is that a true word? Often not. And yeah, there's so much on this. I'm reading this great book now that we're actually going to feature in a WordNet blog post, the five words from blog, where we take five words from an interesting book. It's called AI Snake Oil. Yeah. And it's got some really interesting ways to think about AI, generative models, predictive AI, right?

that I think are super useful. Like what is something good for and how can you tell is I think the key problem of AI. And I can rant about this probably for hours because I found it a fascinating topic. But the short answer is it will probably change things. It's not going to be as cheap or as easy or as high a return on investment as people think, because you know what else? It's

lexicographers are way, way cheaper than AI engineers by like an order of magnitude.

So if you have to hire a data pipeline engineer and an AI engineer and someone to write the code and someone to tune the model, there's a lot of lexicographers you could have hired for that money. And tuning the model is really the thing, right? Because, I mean, you could say, okay, you're spending a trillion dollars, but you're not just replacing lexicographers. You're going to use it for medicine and scientific research and...

you know, a whole bunch of other things. And so the idea is that like, maybe if it's spread out over all those things, then maybe, you know, maybe it's worth the big investment, but to get something that is really good at a

a task, then you have to do some special training generally. Is that right? Am I understanding it correctly? Yeah, you want to tune the model so it gives you the kinds of outputs that you want, kind of like tuning an engine. The other thing that is a problem for lexicographers in particular about AI is that these models have to be trained on vast amounts of text.

And everybody right now is in an arms race to collect as much text as they can because they think bigger is better. And there's some research coming out now that means maybe that's not true. But like HarperCollins announced today that they're going to license their books, published books to an unnamed AI company. And people are starting to feel like, well, this tool might replace me. Why would I give it access to my work so that it can be me? But that's the same data that lexicographers depend on to make dictionaries. Yeah.

That's the same data that computational linguists depend on to do their research. If all that data is no longer considered fair game for research because the AI tools have been bullies about it, our scientific progress in these areas will crawl to a halt because no computational linguist, no lexicographer has the money to do that kind of licensing.

That is so interesting. Yeah, I saw that too. And people are angry and they're saying, no, you can't use my books. So, but yeah, but then if they say the same thing to less intimidating, less aggressive researchers, that is a problem. And we've always considered this fair use because if you look up a word in a dictionary and it has one sentence from your book,

That doesn't replace the value of your book to somewhat. Like, hopefully we don't use the one sentence that gives away the plot. But generally, it's not considered a competing good, right? But if you train an AI model to write novels in the style of Aaron McKean, I would be irked. I would be very mad. Plus, like, why? And then also...

The other thing that's a problem for lexicographers and for linguists generally is we don't have a good way to understand what text on the internet has been generated by a large language model and what has been generated by a human being. So if I see an example sentence from a blog post and I can't tell whether it was written by a human or not, we're feeding that data into all of our analysis and systems.

Maybe we're just, you know, describing robot English and not human English. We don't know how much text on the Internet is LLM generated at this point. It's very, very difficult to tell. And anybody who says that they can tell is trying to sell you something. Right. And definitely people are reporting that they're seeing things that seem pretty obviously written by AI showing up high in the Google search results. You have to be really careful these days.

Yeah. Yeah. I'm curious what you in your paper, what you what tasks you tried to get it to do that it wasn't so good at. Like, I'm very curious. Because words in general online are not in alphabetical order. It can't alphabetize like a task that LLMs are bad at is putting words in alphabetical order. A task that LLMs are bad at that the most junior lexicographer can do is putting

To look at a list of words and look at a corpus and say, hey, what words are showing up in this corpus that don't show up in this list? It wasn't good at generating IPA pronunciations because IPA pronunciations don't really show up in the data that much. There's not enough correlative information for it.

to have that be a reasonable task for an LLM. Yeah. The task that I thought it did the best at was taking definitions written in an adult level and rewriting it for a lower reading level. That actually worked pretty well. Yeah. I was trying to think of what are the most boring lexicographical tasks that we can outsource to an LLM. And I was like, oh, this is not...

What does every lexicographer hate doing? I think it differs. I really dislike writing IPA pronunciations because I'm really bad at it.

Um, but, uh, I don't know, that would be like, if there were enough lexicographers to fill a medium sized concert hall at this point, like we could do a nice survey. Yeah. I was thinking about that when you were saying sort of a non-replicatable career path. I mean, it seems like today it's much harder to become a lexicographer than it was, you know, 15 or 20 years ago. Yeah. There are no jobs. Yeah. Just no jobs. I probably haven't gotten an email query from a student.

about how to become a lexicographer in maybe six months at least. I used to get them on a monthly basis. Yeah. That's a really sad thing. That makes me...

Really, really sad. Those are cool jobs. I want to like go to bed at night knowing that somewhere in the world there are a hundred lexicographers, you know, looking into word. I, yeah, I mean, yeah. There's a really lovely quote by J.R.R. Holbert, who was a lexicographer in the thirties where he talked about it's the best job in the world because something like you go to sleep every night, like, uh,

Feeling that you have advanced the great work towards its completion and that all of your problems are small but so absorbing and you never go down like a dead end alley for months or years at a time.

And then have to like backtrack. No, you're always making progress. The problems are small, but they're really interesting. And, you know, it's it's really true. Yeah, that's great. And it's not AI that has caused this problem entirely. The fewer jobs for lexicographers, I think it's been happening for a long time. And sort of the Internet has been been really undermining the business model for dictionaries, you know, for for quite a while.

For everything print, basically. Like the advertising revenue does not make up for the loss of the people actually buying a physical object revenue. But, you know, I can't complain because Wordnik couldn't exist without the internet. We couldn't put Wordnik in a book, right? Right. And, you know, it's hard to figure out like what would be a better business model, but I feel very lucky now.

And that wording is basically my incredibly elaborate hobby. I work on it mornings, evenings, weekends. It's like running a small, like little theater, right? And I don't have to...

I don't have to like do capitalism for it to work as long as I'm willing to put in my unpaid labor. And if I can just outlast this particular weird business cycle, then hopefully by the time I'm ready to retire, it would actually be enough of a money-making concern that someone could take it over as a full-time job again. And that's my goal. Yeah. It's a pretty good goal. And in the meantime, I'm never bored. Right. Yeah.

But you do have hobbies. You have other hobbies, too. So I did say we would talk about your love of dresses. You have a dress a day thing going on and you wrote a novel about that. It's very tied into this dress a day idea, too. Right. Can you talk just a little bit about that? Yeah. So about 20 years ago, I started a blog called Address a Day. And for the first I don't know how many years I actually did blog.

about a dress almost every day. And then when Wordnik became a startup, I was like, I have no more time. But I just love dresses. I feel like they're the most fun piece of clothing and I love to sew and I like making dresses. So I thought, well, it was kind of like a chicken and the egg situation. So I started blogging

About dresses. And then if I showed up someplace, not wearing a dress, people were kind of pissed off at me. Why aren't you wearing a dress? You're the dress person. So then I basically have worn nothing but dresses for probably 15 years. Yeah.

I mean, unless I'm, you know, at a yoga class or whatever, but. Yeah. And you, you weren't making a dress a day yourself, were you? No, no, no. That, I mean, it's theoretically possible, but I would probably have all kinds of repetitive strain injuries now if I were doing that. Yeah. I make like a couple of dresses a month in a good month. Okay. And then, and then you, and do they all have pockets?

They all have enormous pockets. I feel like if you can't put your arm basically like almost up to your elbow into a pocket, is it even a pocket? Yeah. Good job. I want to be able to carry like three paperback books and a small rabbit. Like...

Awesome. What's your novel, though, that you wrote about dresses? Oh, so the novel is called The Secret Lives of Dresses because for a while on the blog, I was writing these little like storylets from the points of view of a dress of dresses. And then I got some interest from agents saying, hey, should this be a book? And I had worked in publishing long enough at that point to know that like collections of short stories don't sell. So I was like, let's write a frame up novel.

Where the stories can be in the novel, but they're not the novel, if that makes sense. So it's a perfectly like standard off the shelf chiclet novel.

One of my favorite novelists, Kathleen Norris, she was like the best-selling novelist, best-selling women's novelist of the 1930s. She said that her whole thesis for fiction was get a girl in trouble and get her out of it. And I think that's like the plot of all chicklet, right? Get a girl in trouble, get her out of it. And we like that. There's like a happy ending. It's nice.

So yeah, and it still sells. It's still in print, which I'm very happy about. It's nice to have a book that's still in print. And it did really well in Australia. So it's been optioned for film in Australia. Oh, that's amazing. So yeah, hopefully it'll get made into a movie someday. Oh, let me know if it does. I will tell everybody because it'll be amazing.

Yes. And you'll have to go to Australia. You know, it'll be a tax write-off to go to Australia to watch it. Oh yeah. I will absolutely travel to Australia for that. Australia is fun. Have you been? No, I'd love to go. I highly recommend it.

Yeah. My college roommate lives in Australia, so I really, really should go. You have a built-in excuse. I do. It's far though. It's really far. It is a really long flight. Yeah. But as long as you're there, you should also go to New Zealand. Yeah. And then like, do I have a month? I don't, I don't think I do right now.

Well, talking about books, let's wrap up by talking about, you know, we ask guests to recommend their favorite books. And so can you share some of your favorite books with us? Oh, so it was really, really hard to pick some favorite books because I really love books and have far too many books. So one of the books that I suggested was a book by Diana Vreeland. Actually, she's in this picture behind me. She was the person who basically invented the modern role of the women's magazine editor.

Like she also was the person who started the Costume Institute at the Metropolitan Museum of Art. And she was absolutely 100% bonkers and in the best possible way. And she had this column for Harper's Bazaar that she called, Why Don't You? And they were just ridiculous suggestions. And I actually took the book off my shelf. And so like, it's

It's called This is the Book, Diana Freeland. It's a white cover and with red text at the top and the bottom. And then Why Don't You is diagonally across the middle in black italic text. Yeah. And so she's like, why don't you turn your old ermine coat into a bathrobe? Why don't you? The most famous one of her Why Don't You suggestions was, why don't you wash your children's hair in flat champagne? So...

And so I love reading stuff by Diana Vreeland. I love reading stuff about Diana Vreeland. Like I, for a long time, had a Twitter account where I pretended to be Diana Vreeland and just said, why don't use all the time that we're just bonkers. I'm going to move that over to blue sky. I think very soon. Yeah. I just find her delightful and fascinating.

So is that a collection of her columns or is it a biography? It's a collection of stuff that she did at Harper's Bazaar and includes most of her why don't use. There's another fascinating book that's all the memos she sent when she was an editor at Vogue that are also absolutely unhinged. Yeah.

Amazing. Yeah, you have to check that out. But I like to pick this book up and look at it when I feel like I'm stuck in a rut. So do you know there's this thing called the oblique strategies? Basically, it's a set of cards or there's online versions of them. And it's something to do when you feel stuck in a rut. And it's like, take the last thing you did and reverse it. Right. But I think the why don't you's just send me off into even further directions. Hmm.

Thinking about flat champagne as shampoo. Just getting out there. Okay, what other books do you have for us? I totally cheated and recommended the Steers Woman books by Rosemary Kirstein. They're basically science fiction fantasy. Why is that cheating? There are four of them. Because there are four of them. Oh, okay. And if you like...

If you like your fantasy novels to have like a big dollop of linguistics in them, like keep reading that series because there's just an amazing linguistics bit in like book three. And the whole premise of the Steerswoman book is that there are people who are called Steerswomen. And this is what they do. They walk around and they ask questions and try to learn something.

And if you don't answer a steerswoman's question, no steerswoman will ever answer a question for you ever again. So it's kind of like they're like kind of itinerant roving questions.

librarians. Oh my gosh. I love it. It's so good. It's so good. And sometimes I hesitate recommending these because the author is still working on book five, but the books, the books end in a decent place. It's not a big cliffhanger. Like you can get all the enjoyment that you could possibly want reading these four books. There's a lot. I mean, it's, it's a lot to start with. Yeah.

That sounds amazing. They're so good. Anything else? The other thing I like if you know, when they ask you who you would have a dinner party with anybody in history, like Diana Vreeland is one and Samuel Johnson is another. And I love reading about Samuel Johnson. And one of my favorite books about Samuel Johnson is called Samuel Johnson and the Life of Writing.

by Paul Fussell. And it's a book that's as old as I am, literally. But he really talks about how Samuel Johnson approached writing, not just the dictionary, but everything he wrote. And he was so prolific and talks a lot about his own personal struggle because he really wanted to be a better person than he thought he was.

And so a lot of his writing is in genres that we don't really think about today, like prayers. Who sits down to write a prayer as part of their regular writing practice? Very few people do that, and most of them are in holy orders, right? Anyway, I just find Samuel Johnson endlessly fascinating. Yeah.

And I think that lots of people just read like, you know, the life of Johnson. But I think there's so much more. There's so much more you can read about Johnson. Yeah. I mean, we usually end here, but that actually made me wonder at the beginning of the main interview, you said that you've wanted you wanted to be a lexicographer since you were like eight or nine. And I wonder, were you in was that did you learn about lexicography?

Johnson and become fascinated with him? Or did you just love, you know, were there dictionaries in your house and you just loved reading them and thought people write these and I want to do that? This is the dumbest story. So when I was a like voracious reader, like read for hours every day, always had the most books checked out for the library that I was allowed to by law, basically. And I

I also read anything that came into the house and my parents were pretty like laid back about it. And my dad, who was in sales, got the Wall Street Journal. And I would like read the fun parts of the Wall Street Journal, which were like, there was something called the, I didn't know this at the time, but it's called the floating column. And it's the human interest story on the first page. Mm-hmm.

And, um, there was a story about the second edition of the OED and how it was overdue by like 27 years. And so that's, you know, like four times as long as I'd been alive at that point. And I was like, wow. So I was like, wait, people make dictionaries and that's what that job is like. And I could do that job. Yeah.

And so I was like, I'd like to make dictionaries. And like, I was a little girl in North Carolina. Like nobody knew anything really about dictionary making. There are no dictionaries in North Carolina, like dictionary companies in North Carolina. So they were like, sure, fine, honey, whatever you want. And nobody ever talked me out of it. Right. Like nobody said, oh, there's actually fewer jobs for lexicographers than there are for ballerinas. You know, like,

Oh, that's a great story. That's not a stupid story at all. I bet the people at the Wall Street Journal would be really surprised to know that they were inspiring children. I once actually met someone who worked for the Wall Street Journal and who was one of the editors of that column. And I told her about it. She was like, really? And I still have the newspaper article. I cut it out of the paper. I hope I let my dad finish the paper before I cut it out.

Oh, that's great. I have it in a folder. You need to frame it. It should be on the wall next to the woman you love. It has, it has like my kid handwriting with the date on it. Like, Oh my gosh, that's perfect. Well,

Well, Erin, Erin McKean, thank you so much for being here today. Where can people find you online? Oh, well, you can always find Wordnik at wordnik.com. And I'm on Blue Sky as E. McKean, I think at Blue Sky Social. And yeah.

Yeah, that's pretty much it. Oh, dressaday.com is my blog about dresses. Yeah. And it's McKean, M-C-K-E-A-N. Yes. My icon basically everywhere. My avatar everywhere is a little pink robot. So look for that as the sign of authenticity in your Erin McKean content. Great. Thank you so much. Bye-bye. You're welcome.

I hope you enjoyed that bonus segment. If you didn't catch the full interview, the main show where Erin talked about how she runs her online dictionary Wordnik almost all by herself back in November, you can find it in your feed or linked in the show notes. And thank you again to all the Grammarpalooza supporters. We appreciate your help.

so much. And if you're listening or watching and you would like to become a Grammarpalooza supporter, you would have gotten this show back in November. And most importantly, you know, you just help by showing your appreciation for the show. So you can sign up on the show page, the Grammar Girl show page at Apple Podcasts, or to get everything by text message, you sign up through Subtext.

And links to both of those are in the show notes. And if you want to learn more, you can find out more at quickanddirtytips.com slash bonus. That's all. Thanks for listening.