Current AI, the fact that it's just so opaque and nobody knows what's going on under the hood, that's just not healthy. I'm lucky to be able to talk to experts, getting people who've got different backgrounds, different perspectives, that stimulates my thinking. And, you know, I feel privileged that I get access to people who can do that. And I call that getting unconfused.
Welcome to Unconfuse Me. I'm Bill Gates. My guest today is Dr. Yejin Choi. She's a computer science professor at the University of Washington, also senior research manager at the Allen Institute for AI, and recipient of a MacArthur Fellowship.
And she does amazing work on AI training systems, including looking at natural language and common sense. She gave this year a great TED Talk entitled, Why AI is Incredibly Smart and Shockingly Stupid.
Welcome, Yejin. Thank you so much, Bill. I'm so excited to be here. Are you surprised at the advances that have come in the last several years? Oh, yeah, definitely. I didn't imagine it would become this impressive.
What's strange to me is that we create these models, but we don't really understand how the knowledge is encoded, you know, like to see what's in there. It's like almost like a black box, although we see the innards. And so understanding why it does so well or so poorly, we're still pretty naive. Yeah. One thing I'm really excited about is our lack of understanding.
on both types of intelligence, artificial and human intelligence. It really opens new intellectual problems. There's something odd about how these large language models that we often call as LLMs acquire knowledge in such an opaque way, and then it can perform some tasks extremely well while surprising us with silly mistakes somewhere else. Yeah, it's been interesting that
Even when it makes mistakes, then sometimes if you just change the prompt a little bit, then all of a sudden, even that boundary is somewhat fuzzy as people play around. Yeah, totally. So, quote-unquote, the prompt engineering became a bit of black art where some people say that you have to really motivate the transformers.
In the way that you motivate humans, like, you know, one custom instruction that I found online was supposed to be about how you first tell LLMs you are brilliant at reasoning, you really think carefully, then somehow the performance is better, which is quite fascinating. But I find...
two very divisive reactions to the different results that you can get from prompt engineering. On one side, there are people who tend to focus primarily on the success case. And so long as there's one answer that is correct, it means that transformers or LLMs do know the correct answer. It's your fault that you didn't ask nicely enough.
Whereas there's other side of people who tend to focus a lot more on the failure cases, therefore nothing works. Both are some sort of extremes. And the answer may be somewhere in between, but this does reveal surprising aspect of this thing. Why does it make this kind of mistakes at all?
But we saw a dramatic improvement from the models the size of GPT-3 going up to the size of ChatGPT-4. I mean, I thought of 3 as kind of a funny toy, almost like a random sentence generator that I wrote 30 years ago. It was better than that, but I didn't see it as that useful. And so I was shocked that ChatGPT-4 used in the right way and be pretty powerful in
As we go up in scale, you know, say another factor of 10 or 20 above GPT-4, will that be a dramatic improvement or a very modest improvement? I guess it's pretty unclear. Yeah, good question, Bill. I honestly don't know what to think about it. But, I mean, there's uncertainty is what I'm trying to say. But I think there's a high chance that we'll be surprised again by even increased capabilities of
And then we will also be re-surprised by some strange failure mode. And more and more, I suspect that the evaluation will become harder because people tend to have bias toward believing the success case. And then we do have cognitive biases in the way that we interact with these machines. And they are more likely to be adapted to those conditions.
familiar cases, but then when you really start trusting it, it might betray you with unexpected failures. So yeah, interesting time, really. Yeah, one domain that is almost counterintuitive, that it's not as good at, is mathematics. And so, you know, you almost have to laugh that something like a simple Sudoku puzzle is one of the things that it can't
figure out where even humans can do that. Yeah. So it's like reasoning in general that humans are capable of that these Chachapiti are not as reliable at right now. And so the reaction to that, I think, in the current scientific community, it's a bit divisive. On one hand, the people who
might believe that with more skill, the problems will all go away. And then there are the other camp who tend to believe that, wait a minute, there's a limit, fundamental limit to it. And there should be better, different ways of doing it that's much more compute efficient.
And so I tend to believe the latter. But anything that requires a symbolic reasoning can be a little bit brittle. Anything that requires like a factual knowledge can be brittle. It's not a surprise when you actually look at that simple equation that we optimize for during training these large language models because really there's no reason why suddenly such capability should pop out. And so I wonder if the future architecture
may have more of a self-understanding of reusing knowledge in a much richer way than just this forward thinking
chaining set of multiplications. Yeah, I mean, right now also transformers like GPT-4 can look at such large amount of context, literally. Like, it's able to remember like so many words as is spoken just now. Whereas humans, you and I, we both have a very small working memory and
So the moment we hear new sentences that we hear from each other, we kind of forget exactly what you said earlier, but we remember the abstract of it. So we have this amazing capability of abstracting away instantaneously and have such a small working memory. Whereas right now, GPT-4 has enormous working memory, so much bigger than us. But I think that's actually...
the bottleneck in some sense, hurting the way that it's learning because it's just relying on the patterns, the surface patterns overly, as opposed to trying to abstract away the true concepts underneath any text. One of the areas that the Gates Foundation would love to see is kind of a math tutor. And there's a question, do you need a big, big, big, big model to do that?
You know, because if you make it so big, then our ability to know—
you know, how it behaves, it's hard to test. So we're hoping that more medium-sized models that mostly learn math textbooks won't have such a broad knowledge of the world. At least we're hoping that that will let us do quality assurance. So in academia, there's actually a lot of such effort going on, but without a lot of compute, including my own work that tries to develop models
special model. So usually the smaller models cannot win over chat GPT in all dimensions. But if you have a target task like math tutoring, I do believe that definitely not only you can close the gap with larger models, you can actually surpass the larger models capability by specializing on it. So yeah, this is totally doable and I believe in it.
Yeah, certainly for something like drug discovery, you know, knowing English isn't necessary. And it's kind of weird these models are so big that very few people get to kind of probe them or change them in some way.
And yet, in the world of computer science, the majority of everything that was ever invented was invented in universities. To not have this in a form that people can play around with and take really 100 different approaches to play around with it, we have to find some way to fix that, to let universities be pushing these things and looking inside universities
these things. Yeah, couldn't agree with you more. It cannot be very healthy to see this concentration of powers so that the major AI is only held by a few tech companies and nobody knows what's going on under the hood. That's just not healthy. And especially when it is extremely likely that there are moderate-sized solutions that are open and
that people can investigate and better understand and even better control, actually. Because if you open it, it's so much easier for you to adapt it into your custom use cases compared to the current way of using GPT-4, which all that you can do is sort of prompt engineering and then hope that it understood what you meant. The math tutoring case seems to be the case where the...
Language models have seen a lot of education material already out there online. So that probably is indeed much more around the corner because it has seen a lot of data. Whereas the drug discovery, now the challenge is for AI to come up with something new that doesn't exist yet. So I suspect that that's a little bit different types of a challenge for AI because now it truly needs to reason more.
More in a symbolic manner that is grounded in knowledge as opposed to, oh, there's like a bunch of these sequences and let's predict what comes next.
and get lucky. And so, yeah, that's just inspiring for me to think about the different types of challenges and what it might take in order to push things to the next level. But I think that's basically the future. And I'm excited to see a lot more open source effort really catching up rapidly right now. The fact that it's just so opaque. And then currently learning is unbelievably brute force.
which I don't think is the correct way of doing intelligence. There must be a better solution. And for that, we have to open it in order to be able to really promote better science around it. We need to open it. We don't have to open the largest, the best one, however, because even if you open it, it's not like academic people can do anything with it. Like if GPT-4 is open for me, there's no compute for me to run over. So, yeah. Yeah. I think just...
to deal with the complexity and the accuracy, you probably want to build these things from scratch. Yeah, I believe with a bit more effort, something like that could be built. And with that wishful thought, I'm also working toward that sort of systems where we might have a little bit more explainable ways
descriptive knowledge that we can give to the machine to really truly learn and memorize. And then when it does make mistakes, being able to control the machine through, oh, what kind of knowledge are you assuming for that kind of answer? And being able to provide, oh, you know, your assumption is wrong that way. And, you know, from here on, learn this knowledge. So I think that kind of
problems really unlock really exciting new types of machine learning problems where you need to be able to unlearn, not just learn, but unlearn the incorrect knowledge and then be able to revise over that in the way that humans also are able to. Whereas right now, everything, like you said, is a bit too black box. But I do think that with effort, this sort of technology could happen.
Yeah, I mean, someday maybe we'll understand how knowledge is represented in the human brain. You know, it's one of the great mysteries of how evolution did that. Do you think in the end we'll find, let's say we figure out both the software and the real brain, do you think we'll end up seeing that there are similar algorithms underlying how they work? Oh, good question. What do you think? Well...
I think there are aspects like visual recognition where we can see that as you go up and you're kind of trying to go to higher level representations, that some of the same mistakes that the human visual system makes weirdly appear in these systems. So that at least suggests that there's a common way to
And, you know, evolution, you know, was sort of trying out different approaches. And so it may be that there's this one fundamental approach that we see a glimpse of in software that, you know, evolution, quote, discovered, quote, and managed to use. I mean, it's the greatest discovery.
The human's reasoning capability is so phenomenal. Yeah, totally. So, yeah, like human, the evolution somehow figured out the algorithm behind our amazing learning capabilities. But we humans haven't figured out the AI version of it yet. Right.
I suspect that there's definitely a better algorithm out there that we haven't discovered. It's just right now, there's a bit too much focus on let's make things larger. And everybody's trying to do that. Whereas there may be a really better solution, alternative solution that's waiting to be found.
But there's just not enough of an attention there because people tend to think, oh, it's not going to work. The reason why I very strongly believe that is in part because when you look at... Oh, actually, let's go back to Microsoft in the very first personal computer because...
When that first came out, it was really super exciting and amazing. And then every single year, you know, there's a better computer and smaller computer, a faster computer, and it becomes better and better. So similarly, when we look at phones or, you know, Rockets,
cars, the very first invention is never the optimal solution. There's always a better solution. So I do think that there's a better solution. It's just that right now there's way too much emphasis on the bigger the better. I do think like in the math tutor case, though, the downside of a mistake is
can be pretty modest. And I think we are seeing that we should be able, give us two or three years, to create something there that is pretty profound for engaging learners in a way that's motivating and at the right level for them. So that'll be a pioneering step.
test that, you know, is not the same as relying on it for dangerous decisions. Yeah, I totally agree. Are you worried that things could go too fast, you know, and almost have humans even ignore the control and the misuse? You know, the sense of purpose of humans, if we're sort of dumb compared to the AI, that
I'm more worried about that now than I was a few years ago. Even I get a bit of uneasy feeling if hypothetically suddenly HGI does arrive and it's just all around better than us. How are we supposed to think about that? Like, why?
Are they going to find, replace all of us and we just go vacation all the time? That sounds really boring. So although that thought experiment is quite interesting, even if that doesn't happen still, I worry that AI is impacting human life a lot already and it will do so even more in the coming years. And it seems that...
Unless we put right kind of efforts trying to understand where the
limitations and capabilities are, and then try to develop both policy but also other AI techniques that can better control this impact on humans. I think this could be, if we don't put enough effort, this could be disastrous. So if we're not ready for it, it could be very hard on us. So I'm hoping that
I'm at least optimistic that more and more people worry about this. And then there's a lot more conversation going on. So I hope that it's a sign of people doing more actions around it. But yeah, it's a concern. I thought that we would get these super capable kind of blue-collar robots way before this reading and writing thing started.
became at least somewhat possible. And so the inversion that we don't know how to pick parts out of a box, but we know how to rewrite the Pledge of Allegiance the way Donald Trump would write it, those two tasks, the robot task I thought of as much easier. And so it would
Yeah, that's a really sharp observation, Bill. And there's actually a thing about it, which is Moravec's paradox, which is that the perceptual task that looks seemingly easier for humans are actually much harder for AI compared to, say, chess game, which is harder for us.
which is actually easier for AI. And in fact, that inversion happens in other ways as well. So I'm currently proposing this thought, generative AI paradox, where it might be that somehow generative capabilities are stronger than the understanding capabilities, which again, maybe a little bit inverse the version of how humans tend to be able to understand the
amazing novels, but we just find it harder to write. And again, paintings we can appreciate without being able to generate those great paintings. Whereas right now it looks as if these capabilities are a little bit reversed because when you look at DALL-E 2, DALL-E 3, it's able to generate amazing images, but then there's no amazing current AI that truly understand the image content
in a way that surprised us. Like, they're lagging behind, weirdly enough. So it might be that between generation and understanding capabilities, there's something interestingly reversed about it. But it's almost a paradox that in the near term, the risk is that we kind of overuse it, like take advice from it and it would be wrong.
In the long run, you know, maybe the fear is that it's too good. I mean, you've been in your talk, you express that because it's such a different kind of intelligence. It's both the, quote, smartest by some definitions and the dumbest, like in medical applications. My foundation would love to have the equivalent of a doctor for poor people, you know, who can never get access to that expertise, right?
But, you know, when – how do we test that? How cautious do we need to be when we have a hard time characterizing what we've got here? I wonder – part of me wonder whether, you know, that hypothetical HR-like capability, if it did exist and if it's so good –
can it actually really answer some of the hardest questions that humanity faces like climate change? Again, you know, some people disagree. What is it doing? And can AI really help answering those kind of questions in such a satisfactory way?
such a high-quality, reliable manner if AGI really truly comes in? I don't know. Is it actually going to be good enough for that kind of purpose? And then that relates to your wishful thought about doctors. We somehow need to create these AI technologies that can benefit humanity better, but are they actually going to be super reliable? How much of a gap
will there be? And I think that's like very uncertain right now. We want to believe that it's around the corner in some sense, like especially those technologies that can be really beneficial for the humanity. No, in my 20s, I definitely thought, you know, like for language translation, that there would just be a set of
processing steps, you know, okay, this is a noun, this is a verb, and that it would be kind of an explicit piece of logic. And so when Google found that their logic approach, which was a pretty large team, hundreds, was just beaten by their neural net approach, that was the beginning of this mind-blowing thing. So yes, we are often...
naive, particularly about what it takes to match human capability. I don't know for sure whether we are really around the corner or we are just opening the can of a lot of curious fundamental questions about intelligence. And it might turn out to be that it's a lot messier than we expected.
It's a lot harder than we expected. And then building really reliable, trustworthy AI turns out to be harder than we thought. So I'm not necessarily saying that, you know, that is truth either. We just don't know how far or close we are. Do you see a problem where the commercial applications of this and the money going into it is
Is, you know, a gold rush, you know, even making the Internet gold rush seem modest. Is that would that possibly drain people out of academia who are doing the important work? Or do you see that happening somewhat? Yeah. Yeah.
Unfortunately, there's a leak from academia to industry. But actually, there's a bigger concern for me. Whether they're in industry or in academia, I do worry that a lot of people feel a bit hopeless.
In the way that, you know, there's like really strong messages dominating the field, which is that scale is all you need. And, you know, GPT-567 will be even more amazing. There's maybe nothing one can do about it.
And, you know, so that there's like a bit too much currently shifted toward the prompt engineering as the main research focus. And I genuinely worry about that. Like everybody doing the same thing cannot be good. I do hope that people, I mean, explore, you know, what happens with the bigger scale out of curiosity. But the fact that there's so much emphasis and all the companies, major companies, now they feel like,
They need to catch up with the chat GPT. So I hear from many friends that there's a lot of this internal refocus, reprioritization, which is totally understandable. But if this is global phenomenon, that's not healthy at all. Like we need to put more research effort around safeguarding AI and building alternative methods that are more compute efficient, therefore also less carbon footprint.
Yeah, we need to bring, you know, math and maybe even physics people, but certainly math people. I mean, I feel lucky that I...
was a mathematician, and then did computer science because these models are very mathematical. You know, just being a programmer isn't really the training you need for this stuff. And currently, brute force scale is the way to go, but there may be alternative where, you know, sometimes these smaller models, the specialized models, do learn on a lot easier
more specialized data. And the data is actually the key. And that data can be not just more data, but it's better data, high quality data. Sometimes the data that was really designed to teach you that particular mathematical concept, for example. So when you think about humans also,
Nobody learns very well just by reading random web data. We tend to learn better when there's a great textbook and tutorial. And so similarly, I do think that this is about how to transfer knowledge or information in the most efficient way. So that's another reason for me why I believe that the smaller model or mother size model could have a major edge. But that requires innovation about how to get that information.
alternatively. I've got a turntable here and I ask you to bring in a record album. So this music, it's called "Virtual Insanity," very relevant to our current conversation. But I used to listen to this when I used to work for you. Oh, wow. Yeah, here in Redmond. So this was before I did a PhD. Before coming here, I was excited to learn about this Microsoft programming language.
package called MFC. I don't know if it rings a bell to you. So I kind of self-taught that because it wasn't really part of the curriculum per se and somehow I found the development job. Yeah, I used to listen to this.
The genre is like acid jazz, what they say. But it's not really jazz, it's like modern variety of it. And I believe these are like maybe UK. Virtual insanity. Wow. Yeah, right now it is virtual insanity. It's kind of like jazz and rap. Yeah, next thing we know we'll have the AIs not only making the tunes, but the...
the lyrics as well. What are some of the ways you're most enthused about that AI can help us improve the world? My wishful thought is AI to really better understand humans more than humans ourselves do. And I think that's fundamentally a reason why there's a lot of conflict. There's a lot of disagreement.
And I'm hoping that we can use AI as a tool to better reflect about ourselves and then be able to communicate with each other better and coexist together more peacefully.
Yeah, I completely agree with that. It's kind of scary that we seem to be more polarized and, you know, other technologies gave us hydrogen bombs and, you know, bioterrorist pathogens. And so it's just a dream because the AI is not there yet. But if it could help us understand each other and maybe reverse our
this trend towards polarization, that would be an incredible favor to the world. You know, people, a lot of people worry about AI safety, that it doesn't take over the world, but at the same time, maybe it can improve and reduce conflict and improve understanding. That
That's worth working on. Yeah. Well, thank you, Yejin, for taking time. It was a fascinating conversation, and it's going to be interesting to see where it all goes. Yeah, likewise. Thank you so much for having me here. Unconfused Me is a production of The Gates Notes. Special thanks to my guest today, Yejin Choi.
To be honest, I never imagined to give a TED Talk. I just don't have that kind of personality. But I got the arm twisted to do that because basically the recruiting person told me that otherwise it's going to be just a lot of tech CEOs who are also men. So that was motivating enough. She clicked the right button on me.