We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

#156 AI Reality VS Speculation with Google Machine Learning Engineer Jiquan Ngiam

2025/1/17

freeCodeCamp Podcast

AI Deep Dive AI Chapters Transcript

People

Jiquan Ngiam

Topics

Jiquan Ngiam: 我对人工智能领域的过度炒作持谨慎态度。虽然大型语言模型取得了显著进展，但我认为其推理能力的提升速度不会像一些人预测的那样快，短期内不太可能实现通用人工智能（AGI）。我们正处于多个S型曲线发展的不同阶段，有些领域已经接近瓶颈，而另一些领域才刚刚开始。我从事机器学习研究多年，亲历了从2012年图像模型的突破到2017-2018年Transformer模型的出现。这些突破得益于计算能力和数据规模的扩大，以及更有效的模型架构。大型模型的训练需要巨大的工程投入和时间，需要克服硬件故障等挑战。近年来，我们发现通过后训练微调（例如使用强化学习和人类反馈）可以改进模型的行为，使其更符合预期。然而，重要的是要记住，模型本质上只是计算工具，容易出现拟人化误解。代码数据对于训练LLM至关重要，因为它以最正式的方式表达了推理过程。多模态模型的出现使得AI能够处理文本、图像、音频等多种数据类型，并将其整合到统一的框架中。这使得AI能够完成更复杂的任务，例如根据图像生成代码。未来，模型将朝着更小、更快、更便宜的方向发展，推理成本将大幅降低。软件设计也将发生转变，更加注重AI的使用，人机交互界面将更加注重对话式交互。 Quincy Larson: 我对人工智能技术的发展和应用前景表示乐观。我认同Jiquan Ngiam关于模型推理能力提升速度可能放缓的观点，但也看到了在其他领域，例如视频生成、3D建模等方面，AI技术仍有巨大的发展潜力。我关注到AI在代码生成和辅助编程方面的应用，这将极大地提高开发效率。同时，我也注意到AI模型的局限性，例如容易出现幻觉，以及对训练数据的依赖性。我认为，将个人数据与AI模型相结合，可以极大地提高其效用，但同时也需要谨慎地考虑隐私和安全问题。在设计AI系统时，需要确保用户能够保持对数据的控制权，并能够理解AI模型的决策过程。我期待着AI技术在各个领域的应用，并相信它将极大地改变我们的生活和工作方式。

Deep Dive

Chapters

This chapter explores the current state of AI, particularly focusing on the limitations of AI reasoning capabilities. It discusses the S-curve of technological advancement and the hype surrounding AI's potential for superintelligence, contrasting it with the more realistic, incremental progress being made.

AI progress is more incremental than revolutionary.
Reasoning abilities are not advancing as rapidly as some predict.
AI development is characterized by multiple simultaneous S-curves of progress.

Shownotes Transcript

Translations:

中文

Many people look at LLMs and go like, whoa, exponential growth, off to the skies. No, like, no, I'm like, actually, no, we are more like on this curve where it goes flat up and it's going to flatten out again, like an S-curve. But we're on many S-curves. So this is where the bear and bull thing is. So I'm a bit bearish on reasoning abilities going off to the races and like, oh my God, this is going to be super AGI, I can reason about anything. I think we're more on the

Welcome back to the Free Code Camp Podcast, your source for raw, unedited interviews with developers. This week's musical intro with yours truly on the drums, guitar, bass, and keys, 1996, Kirby Superstar, the theme for King Dedede. ♪

♪♪♪

Back to the Free Code Camp Podcast. I'm Quincy Larson, teacher and founder of FreeCodeCamp.org. Each week we're bringing you insight from developers, founders, and ambitious people in tech. This week we're talking with Ji-Chun Nham. He's a former Google brain machine learning engineer who's building tools to make AI useful for everyone.

not just developers. We're going to talk a lot about the practical limitations of AI, where AI is going, and maybe even what lies beyond transformers in terms of creating AI systems. We're also going to learn a whole lot about how AI agents work. I'm really excited about this conversation. Freeco Camp, our podcast, is supported by Wikidata.

Wix Studio. They've created a grant, and Wix Studio provides developers tools to rapidly build websites with everything out of the box, then extend, replace, and break boundaries with code. Learn more at wixstudio.com. Support also comes from the 11,113 kind folks who support Free Code Camp through a monthly donation.

You can join these kind folks and you can help our mission by going to donate.freecodecamp.org. Jeetran, welcome to the Free Code Camp podcast. Thank you, Quincy. I'm so glad to be here and to be talking to the audience that you have here. It's amazing people and the group and community you have here.

Yeah. Yeah. And, and a lot of them are excited to learn straight from the source from somebody who's been coding for like decades, who has worked, uh, with people like Andrew Ng over at Coursera, uh, who has, uh,

I'm a scientist.

It's amazing. I started off with this dream and like, well, what if the computers could be smart? That's when I was young, right? And then after a while, I was like, can we go and work on that in research? That's why I worked with Andrew at Stanford for a few years in my PhD program there in machine learning. And I think only in the last few years in the Transformers model, have we started to see that glimmer of like, oh, these machines could actually appear to be really smart at what they're doing. They could really be useful in helping us in our day-to-day work.

And it's just fascinating to see what we can do with it. And so that's been always a dream of mine and excitement to get into it. And yeah, it's amazing, I think. Yeah. Let's dig into it. Well, you've been doing machine learning for years. You did it at Stanford in your PhD program, which you dropped out of to help, I think, one of the first employees at Coursera, the massive open online course platform doing software engineering over there.

What has been the big breakthrough here? Is the breakthrough just the transformer architecture, the transformer approach? Is that what brought about ChatGPT and the rush of different applications that have followed it? Yeah, I think, let's see. I worked in machine learning for a long time. Coursera was interesting in that. Maybe I'll give a little bit of a...

personal story there, which is when I started off in machine learning in 2009, 2010, 2011, for me, what was really interesting was how can we get more people to do research then? Because I saw the potential of this technology and what was very clear was, number one, how many experiments, how many things we could run quickly to learn about this technology was really important. And the more people that could learn and do it was also the bottleneck to progress.

So that was actually a lot of the genesis for me and Andrew putting out the first machine learning courses. If any of you guys did it and you went into the Octave assignments and ran into issues there, it's probably all my fault. And like, I was in charge of creating all the assignments. I wrote them up there. That's fun. Octave was like a programming language or like machine learning, like toolkit.

Octave was actually a MATLAB, right? So it's like a free open source version of MATLAB. And back in the day, we were writing all our gradients by hand. So in machine learning, when you have an objective function, you write this thing called a gradient to tell the machine on how to improve a function. So we're doing it all by hand then. And so maybe to your question, what was a big breakthrough was that the first breakthrough came in 2012 when the image models became really good.

So AlexNet came about back in the day where our vision models suddenly took off, where we realized that if we scaled up the models we were working with, convolutional neural networks, and we threw a lot more GPUs at it, a lot more data at it. And this was the beginning of big data, really big data machine learning, a thousand categories, a million images. Now it sounds really small, but back then it was huge. Yeah.

That just blew things out of the water. So machine learning back then had these two camps, right? You're sitting down and figuring out what features to put into the model. Oh, I think image classifier should work this way. So you do that and then you try it out and you publish a paper on it. But then the method that worked and blew everything out of the water was, oh, just give it to a neural network, a convolutional neural network in this case.

put in all the data, train it for two days. No, it was unimaginable to train things for multiple days back then. And then see what happens, see whether it actually gets there. And so the truth, I think Richard Sutton was the one that published an article about this, which is that scaling up

Scaling up data, scaling up compute, scaling up training. This is the thing that has consistently worked over and over again for machine learning. And so we first saw that in 2012. Now, that was a very critical point because it then caused everyone to invest into GPUs, invest into training models for GPUs. And then that took off. That worked really well. Image models are amazing. But then we haven't really figured out language yet.

And language back then, we're using a recurrent neural networks, which is this model which is predicts the next word based on some kind of state of the previous words. We didn't really get that. We haven't really figured out the architecture that allows us to scale it up. And I think around 2014, there was papers on translations that started to bring in the attention type mechanisms using transformer papers.

that was a banner dial at all. And then in 2017, 19, 2018, that's when you see Transformers came about, which took that idea and say, okay, what is the recipe for the architecture that we need to do in order to have it scale up, but be more

architecture that could not only scale up, but be robust to how it works. It doesn't fall apart when it scales up, the gradients are not exploding. And so the team at Google, Noam, Shazia, Lucas, Kaiser, just a bunch of people working on this, create transformers. And then once that recipe was figured out in 2018, we had the convergence of a great set of things. One, there was investments into GPUs and at Google, TPUs for many years now. So the compute was ready.

Two, there was a recipe for an architecture that could keep scaling, in which you keep adding nodes and you keep adding bigger, but it didn't fall over. And number three, all the data was there. We had all this data we collected on the internet in language and images, videos now, and we are ready to now train these models. And then the part that took a lot of conviction from the big companies, teams, was the

Are you willing to spend a whole month training this thing? Sitting down there, engineering the entire system, wiring it up, and then making sure that it runs. And then one GPU, TPU is going to fail, writing the software that manages that. And that so became a very massive engineering undertaking in the end. But once the conviction was there, then the takeoff happened pretty quickly. Because once people started seeing the scaling loss go from there.

That's so much. I'm going to break all that and I'm going to recapitulate that to you to make sure I've correctly understood you. First of all, just one of the things you said there, you're training for an entire month, one of the TPUs falls over,

Does that end the training process? Does that corrupt the training process? Like, like how robust and resilient was it to, so TPU tensor processing unit, you may remember, uh, if you're listening to this, like Sundar Pichai came out and he had this like big box and he's like, this is a TPU. And this thing is like able to do like massive parallel computations and stuff. And it was basically designed, it was hardware designed specifically to be able to,

train models, essentially, is my understanding. So let's say, how easy was it for this month-long training process, a considerable amount of investment of engineering talent and then actual time just running the process? Was it possible for that process to completely crash and have to be restarted? So definitely. In the real world, random things happen. Your hardware is not perfect.

In fact, you see random bit flips, like random quantum bit flips and then they know like the data's corrupted. So the first thing that we had to do or like, you know, just anyone training models was, you know, be able to checkpoint and resume.

So you can train something. If it crashes, roll back to the last checkpoint that you know is good and resume from there. The next thing that you do is you kind of on the fly watch out for issues in your training. So if the training run is going, you go see this big blip. You go like, okay, that's probably not a good thing.

So how do you figure out when the blips are happening and react to that and be able to log in monitoring? Logging monitoring is critical. Exactly. And then the third thing I think is you add to the training architecture a lot of kind of like little hacks that add up to be a lot. So like, for example, let me call gradient clipping.

Your gradients shouldn't exceed some value, so you clip it at some value there. So if an example comes in and it's going to dramatically change the weights of your neural network, you don't allow that to happen. Every example has a limited ability to change a network in one way or another. So you add up all these things in there. So it's like removing extreme outliers, essentially. That's right. To keep it from pulling the mean too far in one direction. Correct. But it's more like extreme outliers need to manage all of that.

But I think this kind of goes back to a thing that has changed a lot in machine learning. Because if you roll back 10 years ago, like any time in the last decade, 2010 to 2019, 2020, a machine learning project had to involve a lot of work, a team getting data, a team training models, a team evaluating them. And then finally, many months later, you try to deploy it and then see whether it actually impacts your application in a good way. This decade, the LLM decade,

has big teams now training LLMs for you. And as a user of it, you don't have to worry about that anymore. You just take it and use it. So the time that you need to go from an idea to trying a machine learning based idea out, a predictive idea out, it's much shorter now. It's just in the hours, minutes, maybe.

I think that has been dramatically changing how we think about how we develop and deploy AI. So the iterative process has gone from months to potentially minutes. Yes, I think so. Several orders of magnitude in terms of time performance gains. That's substantial. All right. So here's where I'm going to recap everything.

To the best of my ability, everything you said, because I like to... I take extensive notes when I do these interviews, and I like to process everything I'm hearing and not just nod along. I feel like lots of interviewers just do hand-wavy stuff, but I'm serious about engaging in this stuff as a relatively layperson, as a non-machine learning engineer. Let's talk about the history of machine learning. So as of 2012, you said...

Huge breakthrough. Image models got really good. I remember one of the first assignments in Andrew Ng's machine learning course, one of the most popular massive open online courses ever created, I believe was like recognizing handwritten digits. Oh, yes, it is. And classifying them. Yes. So there are lots of different tasks that you can give.

a machine learning model, but one of them is classification and where, where it's trying to sort like from the image, break the image down and understand, okay, what is it likely? Is this, is this digit that I just drew? Is that like a one or is that a seven? Right. Things like that. Um, and what is probabilistically, you know,

the likely outcome, the best prediction it can make. So 2012, we had a big breakthrough in image models, and they got really good. And then at that point, we kind of figured out, as a human species, researchers figured out that they could just throw a lot more GPUs, graphic processor units, parallel processing units,

at this, a whole lot more compute. And there were two camps of machine learning that emerged. One that was focused on figuring out which features to put into the model, which is, you know, like an image classifier, how it should work. And then what was the other camp? You mentioned there were two camps. So this is like around 2012, around the pivot, so to speak, to all neural networks. Yeah.

For the longest time before that moment, a lot of machine learning was designed very carefully hand-designed machine learning, which has still a big role to play, by the way. So, for example, you'll be designing the right filter to use on an image, which is like, what is my detector on an image?

Or you'd be designing for language, you'd be trying to figure out, very careful designing syntactical structure for language. And this is what syntax is, this is what grammar is and so on. And then the flip side is don't hard code the designs, but instead let the data tell us what it should be. So whatever filter you need to use on the image, let the data

let a model learn it. Whatever syntactical structure, grammar, don't hand design it, let the model learn it. And so there's pros and cons. The pro of that is that the models, if they can learn it, become extremely adaptable to whatever

you know, it might be in the real world, but the common status actually much more difficult to understand what's going on. So it's a bit more of a black box. So the model makes a prediction and you go like, I have no idea why you predicted that, but right. That's great. But at the end of the day, what kind of really mattered to all of us was that, um, this, the quality of the outputs is the quality of the prediction. Uh, good. Right. Is it going to make a difference in the user's experience when they use a product?

And if you think about that, users don't really care about how it was built, the sausage, and whether it was hand-coded or learned. It's like, hey, is this actually going to help me in what I'm working on? And so that method of letting the model decide, letting the machines learn on their own, just won out in the end. Yeah, yeah. I mean, like a practical analogy, sausage, and I apologize to the vegetarians listening. But... Yes.

A lot of people don't really care how the sausage is made. Is it affordable? Is it delicious? You know, like those would be more important criteria than, okay, was it ethically sourced meat? Or were there like any, you know, preservatives added that like 10 years from now might be outed as potential carcinogens and things like that, right? Or, you know, what's the cleanliness of the meat?

where the sausages were produced. Those kinds of considerations, right? Some people do care how the sausage is made, and we're seeing that play out in AI where there are a lot of people advocating for kind of like this no black box AI. And yeah, so it is interesting that... I'm not sure where you would consider this, like supervised versus unsupervised or like...

how would you define like a model where there have been kind of guardrails and like a roadmap given to that model versus something that just is literally learning from first principles through its training process? So, so they, I think the main question is where does the guardrails come in? Yeah. Like do they come in in the design of the system or do they come in and you know, the design of the data or,

Or do they come in post-training, which is the common thing these days, which is you train the model and then after training it, you do something called post-training where you encourage it to do some behaviors and discourage it for some behaviors. And so I think what we're seeing is that rather than have the constraints come in and the design of the system, have they come in and the design of the data and design of the behavior of the machine learning system?

So for example, data quality and data cleanliness is super critical because these models are an essential reflection of the data. If you give it data that's all of one type and they're all biased in a particular way, it's going to give you the same bias outputs. If you tell it a coin comes up 75% hits all the time and that's in the data, it's going to tell you that that's what the coin is going to be like.

Garbage in, garbage out. Garbage in, garbage out. Exactly. And then post-training, what we have learned in the recent years, pretty recent actually, which is you can use reinforcement learning to essentially encourage the models to behave in a particular way, to reject some responses, to accept some responses. I think I've heard terms like constitutional AI from Entropic. Yeah. That, RLHF is another one. Yeah.

And I think... So, like, a good example of this maybe would be, like, the LLMs have spent a lot of time on Reddit, but they probably also read a whole lot of terrible stuff on Reddit. Like, a lot of... I mean, there were a lot of subreddits that were just, like, outright banned, but there are still tons of extremely toxic, nasty... Or, like, 4chan and stuff like that. Like, the model might have spent some time on 4chan. But the, you know, the post-training process...

where there's reinforcement learning with human feedback. RLHF, I think is what it's called. That's right. Those people are being like, oh, no, you can't talk like that. You can't talk like you're on 4chan when you're responding to a fifth grader's math question or something like that. Maybe all that stuff was in there, but it didn't know right from wrong, and it kind of learned that from... Or what is...

you know, cultural norms and stuff like that. Like it learned that during the post-training process. Totally. And I think, I think one thing to keep in mind is the way I think about it is, uh,

It's very easy for us to start to anthropomorphize the models. Right. And I think it's not bad. It's a good thing because it is actually the best way to work with the models. You think of it as a human. You work with it as a human. That's the way to work with it. But then it's easy to fall into the trap of thinking that it's like a human. Yeah. It is just a calculator. It's a calculator on words. You punch in a bunch of words and more words come out. Yeah.

It just predicts the next likely token. Exactly. And that's all like GPT responses or any sort of thing that you're getting from AI is like a prediction of what it thinks is correct. And I just wanted to find anthropomorphism. Anthropomorphizing is a very large word, but it basically means just like...

endowing with humanity where humanity is absent. Like you can anthropomorphize a paperclip and suddenly you've got Clippy, right? Yeah. Yeah. So I just want to make sure like that giant word that people may not be familiar with. I just want to define it and I want to go deeper on this, but I just want to go through my recap real quick. Cause I think it's really interesting to look at the history. So essentially like for many, many decades, I believe my understanding is that

conventional wisdom is you have to give the roadmap in either the data or you have to give it in the training process. And so most AI involved a whole lot of like artisanally crafted corpuses of data that were fed into it or that were heavily structured, right? Like think of a SQL query as opposed to like a NoSQL like

storage database or something like that, or even maybe less structured than that, like a landfill versus a shopping mall where everything is nice and categorized and organized, right? Technically, there are lots of atoms and there might be a lot of useful stuff in that landfill, but it has to be completely like... It's going to be a Herculean effort to get everything sorted and actually extract value from that landfill, right? So...

for a long time, people were basically doing those two approaches. And then once we started adding a whole lot of compute after, uh, 2012 ish, uh, I think you said 2017, 2018 is when transformers came about and that's what kicked off the whole, Hey, we can just post train these models. It,

Is that accurate? I want to make sure I capture this accurately since I'm talking to somebody who absolutely knows this stuff. So I think the main difference there is two things. One is that the invention of the transformers was meant for language initially, attention-based models. So one is that we didn't have a good way to scale up a language model that actually worked really well across very long sequences of text.

And the Transformers was the first approach. I was like, hey, this scales. This actually works if you keep making it bigger. And by scale, you mean like you can have a whole bunch of processors working in parallel and you can train exponentially faster than if you just have like a single core processor.

Yeah. So I mean, scale maybe in maybe a few different ways. One is compute, which is what you're talking about. Like we have now the compute ability to scale because we invested, you know, that half a decade worth of like,

engineering to scale up compute for machine learning using GPUs and TPUs. And number two is that the model's architecture could scale without falling over. So the gradients that we talked about earlier don't blow up anymore and there are ways to have it be more controlled.

And then number three is that we have all this data that we can scale on right now, all the language data in the world or the data that we have. Now there's a number four in there, which is kind of interesting, which I think led to the explosion in interest here, which is the architecture of the transformer, the way it was designed was actually very amenable, not just for language, but also for vision, for images.

And so the way it works is that for language is really obvious. You know, you take every word or every sub part of a word and say, that's a little unit that goes into the model. For images, it turns out that you can take pixels and say, hey, this patch of a pixel can also be fed into the model as if this is a word, treat it like a word. And before the transformers, we didn't have a approach that was very, very nicely tying together the two modalities.

And now the same approach could seamlessly just accept both inputs, text and images, and represent them in the same world, the same space that the model understands. And I think that's the number four part that was really interesting, which led to more investment because people started to see the potential for building a single model that could capture not just language, but a language vision. Now also audio, video,

essentially almost any modality you're neutral at it, if you have a way to tokenize it, and that's a term we use, a way to make it into tokens, into words that the machine understands, they could all become part of this unified ecosystem now. So that was a really exciting moment because that was the critical insight in Transformers. You can use this for everything. Yeah. And that's why AI music is...

Art. AI. And I'm using these in kind of air quotes because I'm not a fan of AI slop. But it is impressive what it's doing even if I think it's useless. And I'd much rather listen to something composed by a real human or performed by a real human. So just take it...

Take that with a grain of salt. I'm not one of those people that's like, I can't wait for Hollywood to just collapse so we can just have AI actors and everything. I don't think that'll be particularly compelling or interesting. We'll see what happens. But I do want to point out that there was suddenly dramatic improvement in the types of images that AI could produce, the types of music that AI could produce, the types of speech and the

Verisimilitude, the realism of the speech. One of our engineers at Free Code Camp just took maybe two or three, they took like four hours of me reading an audio book. Episode 100 of the Free Code Camp podcast, I celebrated by reading an audio book version of my book. Sorry for that digression, but if you ever want to hear that, that's the training data that was used. And then suddenly she went in and she had me speaking Spanish fluently.

fluently even though my spanish sucks and she had me like giving like this weather forecast in spanish and and doing all these other things these words that i'd never even said and it sounded just like me and i was like whoa that's really impressive but a lot of these different things happened uh and of course the video is very impressive even though it is very uh uncanny and it's disturbing to watch a lot of the ai video but um it is very impressive that all these

started to take off at about the same time. And the reason they all took off at the same time, if I'm understanding you correctly, is we figured out ways to tokenize things and feed them into LLM so that we could harness for other media or for other, it's called like multimodal, for other modes, LLM.

we could harness that power that we were already harnessing with words. Cause we've had a good, you know, word calculator for probably like six or seven years. Like even if you go and look at GPT too, it's not bad at all. It could write like rudimentary, like, you know, uh, write me a book report about Shakespeare, right? Like we've, we've had the ability to do that for like five or six years and it was pretty good. Um, it was maybe on like, uh, you know, fifth or sixth grader level. And then recently, you know, it's,

passing the bar exam and stuff like that. So it's, it's definitely gotten better with text, but that process of leveraging that same phenomenon that works so well with text, uh,

with other modes is kind of how we got this big explosion in lots of different forms of AI all at the same time. Totally. And in fact, I would maybe say two things about that. That's really interesting, right? So if you think about what you just mentioned, someone on your team took your recording and then made it into Spanish, like used on the Spanish native speaker. Now, what did the model have to understand to learn to do that? Number one, it had to understand how to translate between English and Spanish.

And it turns out there's a lot of texts that are paid translations. There's an English text, there's a Spanish text, like UN meetings and all. There's a lot of that. So it changed on that. There's a lot of speech to text, which is from English to text, to the content itself. And maybe for Spanish texts as well, Spanish to Spanish speaking texts. And so what it did here was that because it understood everything, the model...

If the model was only trained on, say, translations, it's not going to be able to understand speech. If the model is only trained on speech, it wouldn't be able to do translation because you can go from English speech to English text. But because the model was trained on both speech and the huge corpus of language that we have,

which includes translations. Now we can do both together and you get this interesting effect where you can go from speech to speech, where you're not actually seeing the intermediate transformations, but it's a black box that goes end to end. So previously we would actually design systems that will go from speech to text to translate the text to speech. That's actually something people would do, but now you can go all the way and that's fascinating. And maybe tying that back to a bit of the coding. So one of the most invaluable trained data is actually code data.

And so I'm not sure maybe we've talked about this, but code is actually the most important training data in my opinion, when training LLMs, because code is a most formal way of writing down reasoning. You're saying that this is what you're going to do and do this next, da, da, da. But if you write it in English, it's very ambiguous and code is very precise. And so it's a way to express reasoning. And what we've seen is that as models are trained on more and more code, their reasoning abilities actually do get better, even in non-code task.

And now take that and now take, say, the same model also understands images.

And so what you're starting to see this year, very early stage is you can now send a screen capture of a website or a mock or something that you want to create and say, hey model, write the code to produce an app that does that. And the input is an image, right? And because again of this multi-modality that it has, it's able to tie it all together and say, oh, this is the JavaScript and HTML code I should write to produce that picture.

And that's really fascinating, right? Because now we're starting to bridge into really practical, useful situations here where you're like, hey, the website has a bug. Here's a screenshot of what a bug looks like. Can a model actually figure out where in the code the bug is and fix it? And these are the things for the cutting edge, I think, where I've seen people experiment with that ties together multiple modalities in an engineering sense. Yeah. Yeah. It's...

I just want to reflect for a moment on what you said about code being the most formal way that humans express reasoning and critical thinking. A lot of people, contrary to what you might believe, if you're a philosophy student, most humans don't sit around. There aren't a whole lot of humans paid to sit around and write logical...

and things like that. Most people who are paid a lot of money to think really hard are developers. And they're probably trying to think really hard and come up with all these edge cases and all these...

design specifications into functional code. And the way they do that is they figure out how community computers are going to understand things and how they're going to process things. And then they work backward. Okay. What series of very precise instructions can I give the computer to put one foot in front of the other? So these kinds of dumb computers can like do pretty remarkable things like, uh, for example, serve streaming video or, or use like,

GPS satellite data to figure out where in the house my cat is and things like that, right? So I think it's really important to acknowledge and understand that that is a key competency of developers is taking the complexity, the ambiguity of the natural world around us and constraining it

into something that can actually be run by a computer deterministically and you know that reliably works and then figuring out how to take that code that they just wrote and put it into you know a gigantic code base like Google where you used to work famously has a two billion line code base and it's probably a lot bigger than that that was what was reported like many many years ago but it's like I think it's the biggest code base in operation but

I mean, if you think about the staggering complexity of that and how no developer is necessarily going to really have even the most naive inkling of what the entire code base does, and they're probably just focused on a very tiny portion of it, trying to understand the edges of that portion so they can fit their code in on the right API endpoints and stuff like that.

This is where I want to transition into talking about these breakthroughs that AI is having with coding, because I know that's where you're working. Like we haven't even mentioned it yet, but you founded two years ago, this company called Lutra, which is focused on essentially empowering lay people, relatively people who are semi-technical,

with the abilities of a developer, allowing them to essentially harness the power of extremely precise code instructions and just be able to translate their human, natural language instructions into code that actually runs. Would that be an accurate assessment of what you're trying to do? Yeah, that's a totally accurate way to say it. I'm happy to show quick demos if we have time for that. Maybe you could just verbally walk us through a demo because a lot of people listen to the podcast rather than watch. That's right.

That makes sense. So what they could do in Lutra, for example, a very common task, I'll just give an example and then we can go into what it does there. Sure. You can tell it like, hey, Lutra, could you go to, say, figure out all the coffee shops in San Francisco and then put that in a Google Sheet. And then Lutra would then figure out how to use its tools like finding the web information, searching maps, and how to create a Google Sheet and everything.

given this is AI that understands all these tools natively and is able to work with your data, create it, right? And then so it can do things like fill out spreadsheets for you and then do more research for you and say, hey, for each of these places you found, I want to know what equipment they use. I'd like to know more details about them. Are they hiring? What kind of baristas they have? What beans they use? And so you start to do a lot of things in there. And our goal is to see how far, you know, how much can we delegate away to the machine

very repetitive manual tasks, depending with that. And,

allow people to then get time back to do more high-level things, right? Strategic thinking and so on. Now, the hard part of this, and a lot of this starts to sound a little bit like AI agents. Yes. That's what people have seen. And the hard part of this is really figuring out, number one, what is the right, I'm calling it the AI computer interface to build, to figure out that this is how AI should, or LLMs should think about working with a software stack, right?

Number one. Number two, how should it understand the intent from a person, from ourselves on what you want to do? How should we go back and forth with them from a human computer interface perspective?

And then how do you architect a system that stays on track, does the task and scales up very nicely? And so at the core of what we do, which is I think really interesting for this community, is that rather than have the AI take action at a time, like does a web search, does this, add a role. Instead, what we do is that we have the AI write software behind the scenes and run software.

So this is almost as if you hired a junior engineer that could write software for you on the fly, just as you want it for what you want at that moment, and then run the software for you at that point of time. Right. And then it's able to then look at that, fix it, keep going and so on. And so that's what we're doing there. Now, the major, the major use case that I think right now is like, you know, research spreadsheet filling emails because people spend a lot of time in spreadsheets and emails. And I, and there's a lot more we can do with that too. Yeah.

Yeah, okay, so I 100% use spreadsheets all day long. I use email all day long. I'm a huge email stan. I think that this open protocol that is 50 years old at this point is still way better than using Slack or something like that. We do use Google Meet.

for free code camp. By the way, we're not completely doing everything through email correspondence, but I would much rather have everything in my email inbox where I can easily search it, where it's plain text and I can easily manipulate the text and potentially in the future take it and export it and use it in novel ways.

than have it distributed between Twitter DMs and Slack and all these different places like that. I'm a huge proponent of using email almost as an external brain, which I'm constantly referencing and searching through. And so I think it makes a lot of sense. This is just an offhanded comment. It makes a lot of sense to focus on email and spreadsheets first because you can't boil the entire ocean. You really do need to focus. And those are two very...

high impact places to focus. So I want to step back and reflect a little bit on something you said. You said that Lutra is taking the instructions and it's kind of like, okay, I get it. Okay. And it's like, it's almost like it's coding as you're talking. And then it's, does it run the code and then iterate on the code? Like, like it's writing like a data scraper to go and scrape Yelp or something like that, or to interface with Yelp's API legitimately or something like that. Right. And, um,

And it finds like, oh, this didn't work. Like it's going back and iterating on this code. Like maybe this is deprecated endpoint. Let me look at the docs, you know, stuff like that. Is it thinking like that? How does it work? Amazing. Yeah, that's a really good question because that's, I think you hit it on the head, right? Like number one is that the models, the machines, they don't get it right the first time.

Because they're going to try it out and they go like, oh wait, if you went to Yelp and you clicked around, you're kind of exploring a bit and you're understanding the world it's exploring. So yes, our AI does actually react to that. So what it does is that, I would describe it as this framework of OODA. I'm not sure if you've even heard about it. OODA? OODA. What does it stand for? Observe, Orient, Decide,

and then observe again, orient, decide, act again. So what Lutra does is run that loop of, okay, let me observe the world, which is, okay, you're trying to do this. Your spreadsheet looks like this. These are the headers. These are the example roles in there. Now orient is like, okay, based on that,

What should my plan be? How should I intend to update this data or what should I get it from? Okay, I don't know enough yet. Let me go to Yelp and see if I can get that Yelp page correctly. Okay, let's act on that and say, okay, let's pull just the page from Yelp and see what data we get. Okay, now we got that data. Okay, go back to observe again. Let's observe what data we got from Yelp. Okay, it looks like we want the ratings, the reviews, and the number, like maybe a summary of the last three review feedback.

Do we have that on this page? Oh, we don't have it. Let's try a different page now. Orient again, act again. And now we have it. Okay, great.

Now that we have it and we can update it, let's go try to update the spreadsheet. Let's do it for one row. Yeah. Okay. And then that worked. Okay. Now let's do it for three rows. That still worked. Now let's ask the user, are they ready to do this for a thousand rows? It's like dipping its toe into getting the process done to see how it works. Checking the feedback before it commits fully to running its script. Exactly. So whatever I described to you actually literally describes a very common user experience today on the platform. It's like that's what happens right now.

In fact, yesterday we had... Do you see it unfolding in real time? Yes, you do. Maybe we should just verbally describe. So, like, okay, do this. And, like, with GPT-4, for example, you can use, like, the... It was called, like, the code interpreter for a while. I'm not sure what they're calling it now. But basically, you could run... It would write Python scripts, and you could view the Python script. And as a dev, I can understand, like, oh, okay, I can see why it's doing these things.

And even though AI is still a black box, the fact that it's creating this code and that code is what is kind of like it's using the code as an intermediate step for it to process and move on. By the way, OODA, I just want to repeat that for everybody who's interested in this acronym. Observe, orient, decide, process.

This is not new to machine learning. This is actually something developed by the U.S. Air Force, I believe. And that may set off alarm bells. We're arming AI with our military's best practices and stuff like that. But it does make a lot of sense. So...

Just to understand, like a typical use case, like let's say hypothetically I said like I need to – we'll use that example that you said earlier. Like I need – I'm applying for a job as a barista. And let's say I'm not a dev or maybe I'm a dev who's like just needs to work somewhere for a while where they finish going through the free cooking curriculum, right? Very common use case actually because I always tell people don't quit your day job.

Do whatever you need to do to keep the lights on. And then don't start burning money and stuff like just save your money and work somewhere and learn a few hours a day. That's what I tell people. So maybe somebody is listening to this and they're like, man, I don't have a job. My advice, of course, just go get whatever job you can and

And this could be a way that you could do this is you could say, hey, show me all the, you know, try to make a list of all the coffee shops within a two-mile radius of my apartment or something like that that I could reasonably walk to or take my, oh, and you could even say that are near a major subway station if you're in a city that has a subway, right? And so you can pass all these criteria and then you can kind of watch it reason through and eventually it's going, the ultimate output of that will be

Like a Google sheet that has a breakdown of all those places, what they're likely paying, whether they're likely hiring. You know, you get the hours of operation. So you like I don't like I don't like to work overnight. Like maybe you find that you rule out the 24 hour coffee shops just because, you know, the manager says they're not going to call you in to work the late shift. But always that happens. Like I've worked retail before. That always happens. Right.

So you can feed all your different criteria and then you can use that to do like a really smart job search. Okay, like which coffee shop should I even bother applying to as a barista, right? And then, you know, it might include lots of chains and stuff like that. And if it's like a, you know, like a one-off shop or like a smaller shop, like I interviewed on the Free Code Camp podcast, the general manager from like a chain of coffee shops in the Knoxville, Tennessee area. He runs a ton of shops. He's a dev and he's also a Google Sheet.

Guru and he just published a sheet a Google sheet course a few days ago on the free cocaine YouTube channel if you're curious you want to learn how to use Google sheets as like kind of a dev but

I digress. So you could take all this, you could feed that in to Lutra and Lutra would show you step-by-step everything that's happening and like all the decision-making and you could even potentially jump in to its decision. No, no, no. I see where you're going there. And that's not what we don't care about that. Like, just don't waste your time going down here. Just do that. So you can like kind of almost fine tune the prompt information.

somewhat interactively or how would that work? Exactly. Exactly. So then you hit the nail on the head there, which is like you get there and I know Lutra starts executing on this, start doing work. You can start seeing it do work. Now, what you mentioned, Code Interpreter, GPT-4, that's one way to do it. But if you think about Code Interpreter and when you run code, you don't actually see a lot of output.

Right. Unless you put the print statements in there, you don't see anything. It just runs, right? I mean, you could tell GPT to put a bunch of print statements in there. Yes. So what we are working on is actually what we design is actually a whole code interpreter environment of our own. We wrote our own interpreter actually. And that actually starts to say, "Kate, when you run code, don't just have it run blindly, but

how can we very smartly pick out things that when it's acting on things and show you those pieces? So when it's reading a website, we don't go reading a website, print the website to the screen. That doesn't make sense. We'll show you a little preview like, Hey, this is what it's reading. Oh, it's trying to extract data from the website. Here's a little preview. This is what it's doing. Oh, it's trying to write to a sheet. Oh, it's starting to write two rows now. Can we show you the sheet and show you where is it updating? And that whole experience of showing

showing you viscerally what's happening as it's happening, not when it's done, as it's happening in life, it's actually a really key part of it. So as a user of this, you can sort of see it happen. And then when it's like... The way you're describing it, like I don't know if it's ever been described this way, but like I did a lot of pair programming back in the day, like early Free Code Camp, building Free Code Camp was a lot of pair programming. A lot of times it was me working with somebody who knew a lot more than me and they were watching over my shoulder and coding. And a lot of it was me watching and seeing how they would do it.

given the same problem and being like, oh, interesting. You're doing this, this, and this. So it's kind of like almost you're like surfing on their shoulder, like watching as they're working. And so you can chime in and you can correct them. And you might even be able to catch like a mistake that they're making because believe it or not, these models make mistakes, you know? And you can say, oh, don't use that approach. Like here's a more efficient algorithm, you know, something like that. Like if you actually know how to code and you see it doing like a bad, you know, they're not using the right search operator or they're using some, you know,

No longer a best practice. I'll give you an example of that one. So one of the use cases for Lutra is like, hey, give me all these locations on the map. Now, maybe you sell to the local businesses and you're like, hey, this is a day trip for me. I need to visit all of them. Ah, the traveling salesman problem. So one of us did that and was like, hey, can we get it to solve that problem? Hey, can you create a little route for me today and put it on the map? So Lutra runs that, runs the code, generates a map. And we're like, man, the route was not great. It's kind of weird, right? And then we're like,

oh, can you use the two op algorithm, you know, some TSP. Okay. Solve it. And then sometimes the models understand what that means. And then they implement that algorithm on the fly and start to run it. And it's like, wow, this is a great path now. And then can you make that whole little route that you made today into a form that I can bring out on a PDF? Yeah.

Then it's definitely like, great, I can make a PDF for you. Here you go. This is your route for today. And it generates that. And then behind the scenes, it's all code that's generating. And code is really interesting because it is the native language of a computer. Now, if you imagine an AI trying to interact with a machine,

Right. Should it interact to the UI? Should it interact to the code? Should it interact to something? Code turns out to be a very native thing. Yeah. And we can go deeper into that. But I'll mention one other thing in there is that when I describe it as a coffee shop example, you can start thinking about this. What we've seen is that people use it for

for a lot of things beyond that. You have a list of companies and leads you're reaching out to. Maybe you want to run podcasts with them. Or not. You want to research a bit more about them. Now, can you ask a machine to do that all for you? Hey, for all those people that I'm interested in talking to, companies or leads of accounts, go find out more about them. Have they talked about things on the internet? Have they been on other recent podcasts? In fact, researching that. And also, that whole experience of building out information that you own is something that Lutra does really well.

And I think behind the scenes is all this, uh, no code driven things. Right. And I think potentially cross-referencing your own leads that you've generated through like, like, like there's a conference and there's like a sweepstakes or something, or like people put their name in and business card into a bowl or something. So you've got a whole bunch of business cards and, uh, maybe you like used a OCR optical character recognition to like get them put into a Google sheet or some sort of structure. And so now you've got like a kind of a makeshift, uh,

database and they're like, okay, who would these people potentially know? Or are there any common connections among these people on LinkedIn? And like, I can only imagine the, the applications for sales, uh,

which I don't know a whole lot about sales. By the way, Free Code Camp, the extent of my salesmanship is inbound. People send me emails every day, like salespeople and people like that, and they want to partner with us or create a grant for development of courses, things like that. But I know very little about sales, so pardon my naivety on that front. But let's say hypothetically you were trying to sell some big enterprise company

Like, let's say maybe you create, like, something that goes into semiconductor manufacturing, right? And you need to know all the people, you know, kind of like...

upstream for you who would potentially need to use that and would be using a competing uh you know tool or something like that right like again exposing my relative ignorance of how semiconductor industry works as well while i'm at it uh chip war chip war is a pretty good book by the way if you're interested in semiconductor manufacturing in the history of it but um

Let's say that you are in that situation. Maybe you could just break it down. How would a system like Lutra be helpful in that? Oh, totally. So maybe I would go on. I'll give a real concrete example. So we actually have a customer of ours that they build medical equipment. And they sell cancer screening equipment for dogs. Okay.

And they sell it to vets, right? So clinics around, you know, it's like cancer screening. This is non-invasive. It's really easy to use. Costing the, you know, 20, 30, like 10 to 10 to 20, 30 K equipment. And they're looking for all this clinics to sell to. And they're like, number one is like, you know, just like coffee shops or clinics. And they go like, man, who should I sell it to? It turns out, number one is that you got to have pretty big footprint. Like, you know, if you're a very small outfit, you may not be able to afford this. You name multiple doctors in the vet, you

you probably want and what they realize is that if you're accredited with a particular industry certification, AHA is what they go after, you're much more likely to be a good customer.

And now all that data isn't available in some database you can just buy and say, oh, give it to me. But if you look up the websites for these clinics, they do mention it because this is really important to them. These are our doctors, this is a certifications and everything. And so what they use Lutra for is, hey, Lutra, I've got a list of people that are interested in me. Can you go and look them up? Number one, figure out if they're pet owners or actual doctors. Number two, if they're actually doctors in a clinic, hey, what is their clinic about? Does it have any branches?

Does it have certifications? Do they do this kind of services for their pets today, like cancer and whatnot? And if they do, reduce the list to that. Have they been around for a long time? Have they been around that recent? So you can start asking it to do all that work for you, compile that data, and now say, okay, now I'm ready to figure out these are the people that I should go after. So if you meet them, if you decide to go fly down there, send a mail to them, email them, much becomes a very targeted way for you to reach out to them. Yeah.

And then the same thing happens not only for outbound, but for inbound as well, because they also get inbound interest just like you do. And so they just double check like due diligence on the control. So if you have lots of inbound coming in, how do you sort over that? Because you don't want to spend all your time going through the low quality inbound that may not be real business for you to be serving. And so we say, hey, Lutra, all this inbound is coming in. Can you sort it out for me? Can you do this work, figure out if they are part of the group I'm interested in to talk to you?

So they can do that as well. So data processing, understanding stuff from the internet, that's one thing AI is really good at. We send it in to teach at GPT, but taking that and inserting it into your process, your workflows, automating that, and getting it into the spreadsheets and emails. - Right, where you can scrutinize it and make sure it's accurate. And you can just do spot tests, just like human reinforcement, like the post-training process.

As long as you can scrutinize the code on kind of the front end, and then you can scrutinize the output of data on the back end, and you can proceed with relative confidence where, you know, one of the challenges I've always had with AI, like how useful is it really if it is a true black box and, you know...

But it sounds like if it's going and gathering this information from the internet and stuff and not just retrieving it from its own weights and stuff, you're going to get far fewer hallucinations. Oh, totally. And I think that's actually critical in that the way I think about the models and my forecast of it is actually that the models are really good if we start to think of them as reasoning engines.

which is put in the knowledge, give it the knowledge, give it the task that you want to do and have the reasoning engine, the calculator figure out how to compute the outcome, right?

But if you expect the model to memorize and know everything in the world, it's going to mix facts up. It's going to mix wrong decisions, wrong things up. And that's really bad. And so the more we can move towards these things being reasoning engines that accept in their context and their inputs to them, all the knowledge that you care about for this task, the better they'll do. And the interesting outcome of that is that I actually think the models can be smaller

Because if the model is trying to both reason and memorize the world, you need a lot of weights, you need a big network for that. But if you're only doing reasoning, maybe you need a smaller set of weights, not that big. And so this is why there's a trend that we're seeing right now where the reasoning abilities of the models, they're getting better, but not like a lot exponentially better, they're getting a little bit better still. But then the cost and speed of the model is getting way better.

because we have managed to shrink the model size down significantly in the last year to have it do a lot more but be a lot smaller. So that's a trend, I think, that's going to keep happening for the next few years. So we're going to get smarter and smarter reasoning engines, so to speak. Yeah. And I mean, that makes sense from the way we look at education. Certainly in the United States, there's been a big swing away from trying to drill people, like fill people's heads with knowledge. Yeah.

Because humans have incredible reference at their disposal, right? They have Google. They have various databases of old newspaper articles they can search through and stuff like that. They have, of course, LLMs that can help them pull things up with...

with a command like, hey, pull the three most recent articles in the New York Times on this topic for me. Things like that, right? Or they can just go search the New York Times directly if they have access to the article database. Because reference is so ubiquitous and the information is preserved and it's accessible, you no longer have to go dig around in some card cabinet or go around and look through a bunch of microfilm and stuff. It does mean that you can focus more on training kids to be

reasoning engines as well. And that's something I do with my kids. My kids are like, Hey daddy, do you know like X, Y, Z? I'm like, no, that's an obscure fact that I can easily look up. And as Einstein, I believe said, hopefully not apocryphally, never memorize something you can look up. Right. Um, and yes, it's possible that like, you know, the power grids all go down and like we're left without technology and I'll be kind of screwed because it won't be good at recognizing, you know, a milk snake from a, uh,

whatever the one that kills people is. Right? There are two, like, it's like red and yellow

uh, kill a fellow or something like that. I can't remember the thing, but like one of the things growing up in Texas is you have to learn how to differentiate between these two snakes. And like, I was like, Oh, I could just look that up on my phone. If, if I'm like a cat or some steak in the wild, like, is this the dangerous one, you know, or is this the harmless one that I can pick up and play with? You know? Um, so there are those kinds of reference, like obviously if you're like trying to decide whether to eat wild mushrooms and stuff like that, I probably wouldn't rely on an AI system and, uh, you know, some sort of, uh, um,

you know, visual processing system that's going to like look at a mushroom and tell me, oh, toxic, non-toxic, you know, that'd be cool, but I wouldn't trust my life to it. But my point is, it sounds like what we're doing with models now, where we're treating them as reasoning engines rather than trying to dump the entire internet into them and have them be able to call upon that in a very kind of like,

way where humans constantly get details wrong and they misremember things and you know like first person first hand testimony in like a trial is considered like very weak because humans are so unreliable for example if you're in like a criminal trial and somebody's like I swear that that's the guy that I saw running out of the bank you know but is that really him you know like the human brain is really fickle and like unreliable is really what I'm trying to say but

combining the reliability of information that was just fetched from the internet, right? As reliable as the internet is with the kind of processing power of like a junior dev level intellect, right? Like it sounds like a very powerful combination. And the fact that you can potentially have a much smaller model that costs a lot less, does it cost a lot less to train it as well? Or is it just like the inference is cheaper? Yeah.

Inference being when you're actually using it. More the latter. I think it's more the latter that the inference is cheaper. Because I think what you've seen so far, maybe just an example, I think GPT-4, the first model that came out,

And after that, just GPT-4 Turbo, 4.0, 4.0 Mini. If you look at the name, these are all derivatives of the first big model. Right. So what we have found is that we figure a way to train a very big model and then create smaller versions, like offspring of the big model, so to speak, and that do almost just as well as the big one.

and go from there. So I think there's going to be this pattern in which the frontier labs and the big tech companies will be trained a huge mega model. And then after that, we don't use that, but we use that to create the small ones that we actually use in practice. So that's the thing that we're seeing out there. In fact, I think the most salient one is probably Anthropic. So Anthropic has a clod series of models where a clod opus is the biggest one.

But then if you look carefully today, everyone uses Sonnet, which is the next size down.

And it's the one that we all use and it's doing really, really well. In fact, there's some versions, I think Podsonnet 3.5 latest release recently is actually amazing. It's up performing a lot of metrics and different scenarios there. And so I think we're going to see that trend continue where you still want to train a big model, but may not be there for the purposes of using it directly. Yeah.

Awesome. And so just to be clear, the terms you used, I'm not sure if those are formal terms, but you said like the frontier models themselves, like the really big expensive models like they cost hundreds of millions of dollars potentially to train versus the portable models that potentially could run on your phone locally. And for example, Lama, there are lots of different versions. And I think Gemini also has different versions that...

You lose some of the total performance, but you don't lose as much performance as you would think you would lose considering that the model is so much more compact and so much less computationally intensive to run inference. And again, I just want to emphasize for people that don't know that term, my understanding is training is the process that goes into making the model and then

You run the model and you ask it to make predictions. And that process is called inference. And that does cost money. Every time I run like a GPT query, I'm like, hey, GPT, like tell me a joke about bananas. Like my daughter grabs my phone and she uses GPT all the time. She's always asking that Zelda, cheers to the kingdom. But that does cost money. Like that was like five cents worth of, you know, compute.

I think one is that, I think you hit the nail on the head, which is training is the process in which we learn the weights. And after that, we fix the weights, we fix what we have learned, and then we just use the model as inference. Now, training is a lot more expensive because usually we have to fit in lots of data. When you're running training, you need to do what they call a forward-backward pass, which is like there's a lot more work needed to set the weights. Whereas inference is like it's frozen. It's much simpler. It's like just put in what you want, get what you want out of it.

I would say that the way I think about it is many years ago, if you remember like in the beginning of the internet era, storage era, I had my first, I remember my first hard disk, I think it was like what, 512 or 256 megabytes, this big blocky thing that was that. And I was like, oh my God, I got 256 megabytes of storage. I bought it. This is amazing. Plug it in, download stuff on the internet on my 56.6K modem back in the day. And then now it's like,

It's like free. It's like, you know, like storage data, internet computers, like so cheap now. And I think that's what's going to happen that I'm like super confident in the next 10 years, right? Which is,

the cost of inference, the cost of even training perhaps of these models will go down so much more where what seems like five cents today to do the joke query, it's probably like more like half a cent today or less than that. It's going to become a fraction of a cent in the future that we don't even think about it. It's like something that you don't worry about. And then it's going to be so ubiquitous in that

this technology starts to seep into different things and that we use in ways that we don't, we don't see anymore. You know, it's like we use storage and electricity for lots of stuff right now, but we don't think about it anymore.

And so that's the long-term view, I think, of what's going to happen there. It's going to be fascinating. And I think the models will get smaller, but at the same time, because of the increase in the computabilities of our machines, like take the iPhone, for example, or the Macs. You've seen how every year there's a new iPhone, a new Mac that is just like 30% faster every single year, about the same price. That's going to keep happening.

right? And if you project that out, the only thing that could be possible is that these things would become much cheaper to use when you put ubiquitous there. And so, yeah, to me, that's really interesting. And I think it kind of speaks a little bit to the future, which is, when I think about AI, I ramble a little bit here. Sure, please. There was this part of the history where for the longest time, software is designed for people to use,

The whole field of HCI, human computer interaction, UX design, user experience design was like, how do we design so that people can use software and succeed with it? This is where the buttons are. This is how the affordances should be, where you can click, so on. Now I think with AI, there's a first step, which is what I call the co-pilot phase, which is that even though software is designed for people to use, it's very hard to design good software.

If you use SAP, you use things like Salesforce, they are really complicated, hard to use systems. And so the first step of this, I think, will be co-pilots in which AI was helping you. This is how you use the software. Here's where you should click. Let me click for you there. But then when this models, as we talk about getting cheaper and cheaper to use it, there's going to be another wave here coming up of where software is

it's no longer designed for humans to use and AI trying to assist that. Software is directly designed for AI to use. Whereas now the humans go to the AI and say, "Hey AI, I want to accomplish X. I need you to do X for me." And then the AI goes like, "Okay, how should I do X?" Well, I can pick around the UI, but if the system was designed for the AI to use it,

That's very different. And now the AI can take many actions, do many things in there. And so I think my hypothesis is that we are at the beginning of a paradigm in which we are going to design software for AI to use, which is very different to think about, and then have the AI human interface, AI person interface,

maybe still be more conversational like. So you talk to it, you get it to do things. It needs to have ways for you to understand what it is doing. But behind the scenes, we don't need to have people manually pointing and clicking on things because that's not exactly the value. The value we bring to the table isn't pointing and clicking. The value we bring to the table is what do we want the machine to do?

What is our goal? Do we like the output of the machine? We have tastes. We have all these contacts in our mind on what's good for the world. The AI doesn't know about that. And so what we bring to the table is that. And I think the next 10 years, in my view, is a paradigm shift on that software design layer.

Whereas software design for AI and AI computer interfaces is going to be a whole thing of its own that we're still figuring out. And AI human interfaces, which is how do users, humans, how do we give AI instructions and how do we are able to see what it's doing and how to make sense of it.

Yeah. Yeah. That's exciting. I want to dive deeper into some of your other predictions because obviously we can think about like, okay, what does it look like in a world where humans are mainly like looking at the software that the AI has created and approving it and saying, carry on, or looking at the software that the AI has created and say, okay, go ahead and proceed. Yeah.

Maybe the AI is like, hey, permission to get $20 because I need $20 to be able to get this account so I can do this thing so that I can do X, Y, Z. This goes into the notion of agents. And I want to talk briefly about agents and how they work and where they're headed. Maybe you could just give a very quick overview of what an AI agent is and how

Whether you agree with the conventional definition of what an AI agent is or whether you think that what we have seen is just a tip of the iceberg in terms of what is possible. And then, yeah, how much autonomy should we give these agents? How much of the human work that we're currently doing could essentially be delegated to agents? Just...

I know you have very educated perspectives on this. And instead of trying to tease it out one question at a time, just...

Go for it. Tell me what you think about AI. At a high level, I think number one, it's probably overhyped in the short term. The next two years, we're going to see it's overhyped. But I think it's underestimated in the next 10 years. Bill Gates is the one that said we tend to overestimate two years of progress and underestimate what we can do in 10. So I think AI is in the same space in there where there's a lot of potential. Now, I think number one,

It's a term that I think is very confusing. Because the word agent comes with a lot of connotations of it. Everyone has a different version of defining it. If you go to the technical community, they will define it in some technical sense, like it needs to do a function calling. It runs in a loop, da-da-da.

If you go to a different community, then it's like a completely different terminology there. Yeah, I think like a real estate agent or travel agent or somebody who's doing something on your behalf that you're interfacing with who has an expertise in how all the different flight systems work. Not that I know of a lot of travel agents. I haven't been to one in decades. But yeah, that is kind of what I think when I hear agent. Okay, interesting. So in my mind, that's actually pretty good. I think in my mind, it's really a degree of autonomy.

Right. Which is like, how much can you, how much babysitting or how much, you know, do you have to be there with the AI, so to speak, to get it to do a task? Right. So if it's very, you know, so in classical pre-AI days, it's like, it's the better that exists. You're the one doing the task, clicking around, getting it done. Now with the first versions of AI, it's what we call co-pilots, where they could do very small steps. They can make one mouse click for you. They could call one function. They could do a little bit for you.

And then I think where people see the dream of AI agents is you can go all the way out to the ultimate, which is I have something that I want done. I need to find more customers for my company. Go find those five customers and book meetings for me. That's the ultimate goal in some ways. But then that's also very difficult in the sense that that's a very tall order task that

a lot of context. What does it mean to do this task? How to do it well and not just do it haphazardly and do it wrongly. And also, what is success? And then how much do you engage with the user back again? Which is the thing that we think a lot about is it's not necessary. It's not just getting it to get the task done, but get it to do it correctly.

I think the way to get there isn't necessarily try to jump to the end and make it work. It's more of seeing where we are today. It's almost like self-driving cars. Can we get it to not say, if you want to drive it, you only have to autonomously drive it every single, every city, everywhere in the world. Or do we want to say, hey, can we drive maybe just on highways first?

That's pretty good. Now let's do city driving. That's a very busy city driving. Let's do slopes and everything. So you can see even rollouts for Waymo or Tesla, just how they roll it out. So we're going to see agents take on that form. Instead of seeing a black and white, oh, co-pilot today, agent tomorrow, they're going to see this slow, gradual shift in which I can now delegate a task that took me five minutes.

to the computer and it does it well. Now I can delegate a task that takes me an hour. Oh, does it well. Now this task takes me a week. Does it well? A month. Does it well? As we keep going, time is one of the dimensions I think about and how much you can delegate away in there. And then the bigger and more ambiguous the task is, the harder is it to make an agent. Now, the way I think of my version of what is an agent, how I define it and all,

I think of it from a more user-centric point of view, which is how do I engage with the AI? Am I giving instructions like, "Okay, now click left, now click right." So sometimes you see people do AI demos in which the demo has this big prompt,

and tells it exactly what to do with every step and clicking here and everything. Oh, I made an agent that does this for you. To me, that's not exactly fully right because you're giving it all the instructions. What do you really want to say is, find me five coffee shops in San Francisco that have this coffee machine that they use because I'm trying to sell them that equipment.

go, right? And then it's able to figure out from that instructions, how to do it, get it done, da da da. And then now the, we are going to start to be able to give it higher and higher level commands. And over time, what's critical for this is for a system to have a few things, right? Number one, it needs to understand how to use the tools in the world, how to use the web, how to use your spreadsheets, how to use your emails. Number two, how to interpret your commands into how to interact with the tools.

it needs to build in the OODA thing that we talked about, how to observe, orient, design, and act. It's super critical. It needs to figure out the right balance. And there's a balance here on how much to ask the user, person for clarification and help and how much just to keep going on its own, right? Because the last thing that you want for it to do is that you spend two days working on this, you burn up $100 and you did something completely wrong. Yeah. And you're like, damn, what's wrong with you? Right? And then, but then what do you want? It's like, you want it to check in.

But if you check in too frequently, it becomes really annoying. Because it's like, stop checking in on me. I could have just done this myself because I spent so much time overseeing you doing it.

Yeah, there's the saying, if you want something done right, do it yourself. And one of the reasons for that saying to become popular, which I largely embrace, by the way, I do almost everything myself that I can't just cleanly delegate to a team member or something like that. That's right. It's because you do have to invest a lot of resources in overseeing and making sure it's done correctly.

And if it's not done correctly, certainly in the case of certain things, that's incredibly important, right? Like if I delegated payroll to somebody and somebody was sending out like

payroll for me and they made a mistake, then it's somebody who didn't get paid on time or it's maybe a wire that we have to reach out and somehow figure out how to get it back. It's not easy to get international wires returned. Things like that, right? So if the stakes are high enough, then it doesn't necessarily make sense to delegate it to a system. So how do you get that? So I think what's really critical to understand here is that it's...

not just delegation, but delegating right. But what is right, so to speak, is in the context of the eye of the beholder, right? The way you do payroll is the way different from how I do payroll. So you can't have an AI system that I can do payroll for everyone. That doesn't make sense, right? You almost need, just like a person will do, like a training session. Like, how do you teach an AI system the procedure, the way to do something? And once it learns it,

Can it do it repeatedly, reliably moving forward? Can it react to the world accordingly and come back to you when things are unexpected? Make that happen. So at Lutra, one of the big things we think about a lot is actually that, which is like procedural memory, training a system. And I think that's a very critical aspect of it. I don't think what you'll see tomorrow is like, oh, AI agents everywhere doing everything. No, you're going to see AI systems that are learning,

They're able to come and see how you do things. You have to teach them, you have to show them how to do something. Then you're going to do it reliably. And then this is where I think maybe the gradual change will happen where you start teaching them very small tasks. Like this is my emails. Every day I want you to go through all of them and do this kind of research on every inbound and let me know only this few that are really important. Day one, going to be training it. Now that's pretty good. Day two, say that, hey, wait,

Now that's really good, but I still want you to start drafting replies to them in this format. Here's my guidelines. Follow the guidelines every time you draft me. That's pretty good. Oh.

looks pretty good. Every single email I just send without editing it. Just send them now. And then you're like, oh, can you do my first reply? So you start to reach in a little bit step by step and get it to do more. But it isn't going to be one of those like overnight just everywhere because the way you want to use your AI and the way I want to use my AI is going to be different. So that degree of training personalization is going to be critical, I think. Every company has a different setup. And so the more accessible I think the technology is to people to mold

into their own use cases, the more critical it is. So I think that's where I see agents going, where that's kind of where that will be. Very small initially, small tasks, but rather than an overnight shift and it's everywhere, we start to see that it's just so easy to delegate more and scale it up. And so it starts to seep into every single parts of our lives in that way.

Yeah, well, that's interesting to think about how this technology, this AI agent, if you look at it as a discrete technology from just LLMs you can prompt and query and different AI-powered tools, if you will, like taking an image and automatically expanding it to a certain resolution using AI to figure out what the surroundings would likely be or just extend a wall to have the same wallpaper as what's behind you. I mean, that's pretty...

Those kinds of things are going to come about for sure. But agents, I would view as a discrete...

Kind of like building on top of the next big step in my mind would be going from you delegate a very specific task to an AI to you delegate a very abstract, complicated task that's going to involve a whole lot of reasoning through and multi-step and probably also checking in frequently. And so just to recap some of the things you said,

The system has to understand how to use the tools of the world that people use to get things done, the web, email, spreadsheets. It needs to understand ways of reasoning and understanding the world, such as the OODA, which again stands for Observe, Orient, what was the third one? Decide, yeah. Decide and Act. Act. Yes. Observe, Orient, Decide, Act. Okay.

Okay. And then it needs to also have a feel for how frequently to check in with the user. And some users are going to be more hands-off than others. So this is probably a high degree of personalization. Everybody's going to have their own way of wanting to get things done. And people will say, no, don't do it that way. That's like for amateurs and other people like, oh, that's fine, whatever. You know, so like the way that like an architect or a senior software engineer or a CTO or like a, you know, head of engineering at a, at a,

Fortune 500 company, they're going to think about this very differently from somebody who's running like a tech startup in San Francisco or something like that, right? Like there's going to be a broad range of different approaches and acceptability for failure and things like that, right? Like a data breach when you're, you know, a hackathon project and you just got a few of your friends to sign up is not a big deal. A data breach when you have, when you're like a healthcare provider has massive legal implications.

Right. So so like different considerations like that. So I can see all the different dimensions or I can see some of the many dimensions upon which these agents are going to need to be variegated and personalized. So that is going to be like, you know, potentially multi decade process before these are really like mature agents.

But we already do. I mean, in what you're building and what a lot of other people are building, we see kind of like the origins, the beginnings of this process, which is really exciting for me.

Yes. It's super exciting. We see what people can do with our platform and we get super excited for it. Yesterday someone was... Oh, that's pretty public. So they put it on a Reddit thread. They're like, "Hey, I have all these books and they're running a books type startup. And they're like, I need to get all this information." And clearly they're not so mechanical.

And they're like, they could prompt Lutra and I guess tell it like, Hey, can you figure this out for me? Type it in there. And then you can see the machine start reasoning about things and going through different steps and go like, Oh man. And then they figure it out how to do it at the end.

And it's just amazing to see when it works. In fact, one of the users a few weeks ago was messaging me and told me that he gave Lutra a pretty ambiguous command. Like he had a bunch of companies to research and he was like, oh yeah, can you just go and figure something about them? And it wasn't very obvious what the issues were, but Lutra did a single web search, found data. And then I was like, wait, that's not the wrong data. I should try a more specific web search. So we rewrote the prompt and

to its underlying web search engine. And then after that, ran it again and it got the right results. So what we're seeing here is that these models, given the right scaffolding, the right environment, they can code. You may see that a lot of the week, they can write their own prompts. They can start reasoning about them and they can start running in this reasoning loop that's iterating and making it better. Now, I think there's still a lot of challenges for us in this space developing on this one is that the

The models sometimes do go into loops, you know, and then get all of that. Yeah, and we've seen lots of comical examples of models getting stuck in loops and just running up huge compute bills. And I'm sure they're all, like, we're at the very beginning and there are going to be lots of, like, kind of circuit breakers and things like that figured out. Exactly, and then context windows are limited. They're not, like, that big, too. And so a lot of design goes into the, what I call the scaffold around the model, where the model is kind of like the reasoning engine brain.

But you just don't want to be giving it everything. Just give it all the context to say, go, go, go. You want to be thoughtful on what you give it, how you give it, so that every step of the time it's making reasonings on a smaller part of the puzzle. So it's not trying to boil the ocean. And I think maybe one of the concepts we apply here is what I call levels of abstraction. So if you're trying to get a model

to say, figure out which pixel to click on and they say you're browsing something, say click on that, right? Versus say a higher level task on like figure out how to accomplish this task. Like, you know, I need you to go to this website, log in and get this data and so on. Now those are very different levels of abstraction, right? One of them is like high level reasoning on how to accomplish a goal. The other one is very low level reasoning on, hey, what specific place I should click, what APIs I should call, da da da.

And one of the things that we spend a lot of time designing is how can we have models operate on a, you know, just on one level of abstraction all the time rather than both. Because almost as if, you know, as a human, as a person, if we were to sit down and say, okay, fix all the grammar in this writing, you can go down and scan through the grammar, but it's like fix all the grammar and while you're fixing the grammar, also figure out if the whole high-level story plot makes sense. That's kind of two different things. You can't do that at the same time.

Right. And so figuring out like the right level of abstraction to work with, uh, it's actually a very interesting design problem when you work with this, uh, models.

So you're saying that you might put it down like several tracks, like have one thing thinking about this while another thing's dealing with this. Because like you said, like I've copyedited literally thousands of articles. I was the primary editor. I was the only editor for Free Codecams publication, which has published 12,000 articles for several years. Now, that said, I probably have only edited a few thousand, and then we handed it over to Abby, who's...

doing all those things you just described. She's reading through the articles. She's catching grammatical mistakes. And while she's reading, she's kind of re-sequencing the words and tightening up the verbiage to make it closer to a sixth grade reading level so it's easier for non-native English speakers to understand things and just

you know, breaking things down into smaller paragraphs. All those things that you do as an editor if you actually want people to read your work. If you're writing an academic journal, you don't care about that stuff. But if you're writing something that people are reading on their lunch break, you really do want to make it accessible, right? And if it's having a broad... So...

So that would involve like several different models potentially working in parallel or handing things off to one another. Like here, go do this. Like, okay, can you bring me back this and then I'll take it from there. So you're talking about maybe having several – would this be the same agent that's making use of several different models or would it be like prompting itself in a different way to put itself in a different state of mind? Yeah.

So I think it's a, so the way I think about it is like, from a model perspective, this could all be achieved by a same very big model, different prompt. But then the prompts itself, the way, you know, like, so again, think of them as calculators on words. And if you're trying to do two calculations at the same time, it's very confusing what the output should be. You know, if you're doing only one calculation, oh, that's pretty straightforward. That should be the output, right? So

So you can always have it be two prompts, that one prompt does X, one prompt does Y, and you do that. Now the calculator will work on both prompts. Just don't try to do two computations simultaneously because it's a bit confusing. Now you might want to use different models when you realize that this is working really well. Can I be more efficient? Can I be higher, maybe even better quality, faster, cheaper? And this is where you want to build specialized models.

Now the prompt is locked in, the task is well defined, you know that you want to optimize this and you don't really care about the calculator, this calculator on that prompt working on any other prompts now. So why have this big model sitting around that can do other work where you don't really care about it doing other work? But this is where you take that and you say, okay, let's fine tune that into a one calculator, smaller model, just one does one thing. And there's a benefit of that is that it's going to be faster, cheaper, better when you do that.

So that's a thing that people do do when you get to that point. I have a lot of rapid fire questions. The first builds on what you were saying about having a very specialized model. You have worked at Waymo, which is Google's self-driving car division, right? Like, have you worked there? Yeah.

I worked at Google Brain, but I was involved in lots of collaborations with Able. My name appears in a bunch of papers where they publish with them. Cool. So that is kind of like a specialized task, just navigating a car around a physical world, right? With three dimensions, right? You talked earlier about cars having to deal with gradients and things like that.

Lots of different, not literal gradients, like driving around San Francisco, everything's like a crazy hill, right? Not just gradient in terms of gradient descent and those kinds of machine learning concepts. How do you think self-driving cars are coming along, and would you consider that to be kind of like

a problem that benefits very much from LLMs. Do you think that LLMs will help improve self-driving?

Number one, I think they're coming along amazingly well. Because if you have been to San Francisco recently, you probably see people riding Waymo's all the time now. It's pretty amazing how far it has come. It's one of those things, again, it's not like a step function shift. It's like slowly seeps in and more and more people ride it and it gets better. And then Tesla's also have seen a big improvement there. LLMs, huge role. LLMs in two ways. One is that architecture-wise, we talked about how this architecture, this model allows you to take in different modalities.

So again, a self-driving car has a lot of sensors, right? So being able to put in all of those data and getting mixing sense out of it, huge. Being able to predict not just where the objects are in the world, but what to do. That's also a lot of machine learning involved in behavior prediction, planning prediction, and so on. And the LMPAC technologies are actually really well suited to be a foundational piece in those predictive models. But on top of that, what's really interesting is that when you start combining language models, like actual like

natural language together with the driving systems. One thing that we start to see and a bunch of papers on this, like Wave is a company that's published on this, is that you can start... One of the problems in child driving cars models is that you can't often figure out what is the model trying to do? It's like a black box, right? It's predicting that we should turn left now and not drown a car, but why? It turns out that there's some cutting edge research that say, put a language model attached to that

And now the car cannot only predict what to do, it can start to emit its reasoning. Like, oh, I'm nudging around because I see a pedestrian around there. It looks like a pedestrian that might be moving quickly. So I'm going to be more careful here. And so a lot of cutting edge... You're kind of giving the car like a conscience, like a stream of thought. I don't want to use like consciousness because people might think it's sentient. But basically a way of articulating its decisions and...

And maybe even reasoning through those decisions, like the classic trolley problem. Let's say there's one person here and there's three people over here and the brakes stop working. Which group of people do we hit? Obviously, that's a very macabre kind of example. But to some extent...

having it plugged into an LLM would give it, you know. You hear those few things. I think what you get is that you get a bit more insight, which helps two things. One is that the engineer is working on this. You can maybe debug it more, figure out what's going on. It's kind of useful for a UX perspective. I'm still a bit up in the air on how I created this. I think there's a lot of work needed to be done, but it's very cutting-edge research I've seen recently come out from different groups.

But I think what we'll see maybe at a higher level is that I think we'll see a blend of different things. We will see very, I think we will still come to see very specialized models. We really want to get pedestrian detection done right.

Right? Because that's super critical to us. So let's have models that are very focused on making sure we catch all the pedestrians and have metrics around that and get everything recorded. If you're in a busy parking lot, don't miss anything there. And then we have models that are more generic. Take in all the sensor data in the world and predict where we should drive.

right? Where to go, how to steer, all of that. And then we have more classical robotics methods behind the scenes, which is, okay, given a driving path predicted by the LLM or by the AI machine, how do I change the actuators models to actually drive? And that's like classical robotics that we don't need AI for that. We just need to solve equations and we can figure out a driving control that is smooth.

So you kind of need, I think, all the components working in tandem really nicely. And you'll get a very smooth driving experience that's not jerky because there's systems in there that manage the jerk and everything. And you get these guarantees or you get this confidence in that when we see pedestrians, especially little kids and whatnot, they're going to

do the right things and keep an eye on them. And then you have the AI system that's like, man, there's so much edge cases in the world where sometimes you drive down, there's construction happening, there's this guy in there waving a sign, they go, no go. You can't hard code that. So you need a system that learns from data. In fact, the best example I have was one which is, you know how in some of the crossing streets, they have people that hold up stop signs to help the kids cross the road during school hours?

There's one example I remember, the guy who was in charge of that, took that sign, put it into his backpack because he's done with the day, started biking home. So there's this stop sign that's moving. If you would naively hard-coded that, the car would be like, stop, stop, stop, stop, stop, this keeps moving, this is the sign that's moving. You're never going to go. But then if you look at it from a human perspective, obviously that's someone, that stop sign is not active.

right so it's like in the inactive stop sign you shouldn't be doing that so now how do you figure it out now the edge cases in the world are infinite but then with enough data you start to learn generalizations and what's really cool about the models these days is that do you you not only can get data by going out on the wall and recording it on the cars you can also say hey go read the entire internet of data which has a lot of

I'm sure you can Google things about stop signs, about rules and everything. But also to use that- Yeah, read the traffic book, people read to get their license. Exactly. And then take that data, but what you read, understood about the world, general knowledge, but use it as part of the model that powers the car. So now you can say transferring knowledge from different sources into that. So maybe it's your question of like, where does the LLM come in? I think that's one of the places that gets really exciting where we can transfer knowledge

between systems and not just rely on like, we got to collect all the car data and use that only, you know, the, the world knowledge can come into the puzzle here. Yeah. Yeah. So a couple of very quick questions. Give me your bear case and your bull case for how you think LLMs will unfold over the next,

I realize that's a really long time and it's hard to predict that far. But a lot of people are like, oh, there's going to be diminishing returns and there's going to be more money chasing, less and less gains and stuff like that. What do you think is likely to happen over the next five years with LLMs? I'll give you my case and then we can talk about the two extremes. The average case which I think is going to happen is that, so one is that I think we're on an S-curve.

Okay. Which is like, many people look at LMs and go like, woo, exponential growth, off to the skies. No, like, no, I'm like, actually, no, we are more like on this curve where it goes flat up and it's going to flatten out again, like an S-curve. But we're on many S-curves. So this is where the bear and bull thing is.

So I'm a bit bearish on reasoning abilities going off to the races and like, oh my God, this is going to be super AGI. I can reason about anything. I think we're more on the next part of the flattening of the curve for reasoning, which is like, okay, we're out of data to reason on. The LMs are really good at predicting the next word, but then unless we get a lot more reasoning data, that's hard, right? And there's some strategies on how to get onto our next S-curve on that.

Now we're at the beginning of the S-curves for a bunch of other things. We're at the beginning of it for like video generation, the 3D world, understanding how the 3D world works. That's at the beginning of the S-curve in there. So you see video-based models and everything, you know, still has a lot of opportunities to get better. Image models, we're also beginning, I think we are on the middle of the S-curve there, which is like image models are really good now. They have a bunch more ways to go before we reach the top of the S-curve.

up there. And so the way I think about it is bearing bull cases. Bear case, we just hit all the other parts of this S-curves and then we go like, man, can't make more progress there. The bull case is that we find new S-curves to go on. We find a new way to do reasoning, we find a new way to do that. And then the models get increasingly better. That's kind of like from a capability point of view. Now from a cost and speed point of view, I'm just extremely bullish.

I'm not sure if it's been a bad case in that one, which is cost and speed will just get better almost definitely as the chips get better, as production increases, as we get economies of scale continuously on this. There's a lot of infrastructure investments right now. So things will get faster, cheaper. We know for sure we can make models smaller and still as performing and we'll keep just at the beginning of that S-curve, I would say, on the midpoint.

So smaller, cheaper, faster. So that part I'm very bullish on. Okay, so smaller, cheaper, faster. I just want to recap here. So you think in terms of the kind of critical thinking and reasoning of LLMs, we might be toward the top of that particular S-curve, which there might be another S-curve that's unlocked by some new discovery, like transformers were a really big discovery that powered pretty much everything we've talked about today. But one thing you were very confident is...

the price performance and the actual possible speed will continue to grow. And you don't think that's an S-curve. You think that that's maybe something more similar to like Moore's Law or at least like a linear kind of like

How would you describe it? If you had to wave your hand, what would the curve look like? Oh, it's still an S-curve. It's definitely an S-curve. So it's an S-curve, but we're at the beginning of it. Okay, we're at the beginning of the price performance and speed S-curve, and the model's getting smaller S-curves. The model's getting smaller S-curves, but it's at the beginning of that. And I think in part, I think the reason is that there's this notion of being something so easy to parallelize

Some things are embarrassingly parallel, so to speak, in computing terms. Parts of the system have that nature, or transformers. You can parallelize things. Attention has a lot of parallelism in there. And so because of that nature, you can throw more chips at things and things get faster because of parallelism, number one. And number two, I think, is that as we start to lock down or try to understand, hey, what's the architecture that really powers this?

The more specialized you are in figuring out how the compute should be made, the more specialized chips you can produce that are just designed for these computations. And I think we're going to find probably a few more paradigms after the transformers, but this paradigm is pretty good. You don't need to create a computing chip that can do all the other stuff that you care about. Just say, hey, focus on inference and do that. In fact, at Google, the first TPU chip, the first chip they designed for tensor processing was designed for inference.

Because they realized that, hey, we can train models, but inference is going to be the hard part. So the first chip they designed was all about that. Then the next versions, they made it more and more flexible. And so from those two perspectives, it's just going to get cheaper. And I think there is going to be some physical ceiling limit on some things, maybe on costs. But then I think it's going to be the span of time. Our phones, our machines, our devices, everything has still followed the trend right now.

Yeah. Yeah. One other... Oh, I'm sorry. Go ahead. No, no, go ahead. I was like, maybe the thing that will change is that, you know, as the Apple devices, I use iPhone, right? As they have these neuro-processing chips in there, I think what's interesting is that, you know, some of the latest advancements start to only be usable on the latest phones, right? Latest devices in there. So I think we'll start to see maybe this thing where, yeah, that there's going to be...

you might want to be on, you know, I think there's this like new set of devices where chips are designed for machine learning, chips are designed for neural networks inference. You don't think it's just hype like putting AI chips in everything? You think there won't be? No, no, no. So, I mean like my personal use of AI, for example, I use JetGBT voice mode a lot these days where I talk to it and it tells me like when I drive, when I go on a long drive to work, M-Shot drives too, I talk to it and it writes notes, creates notes for me.

Now that's going now from me to the phone to the internet. I can totally see a world in a few years where that's running just on my phone. The internet's not involved. The chips on our personal devices will get powerful enough to start to do a lot of this work for us.

Yeah, very cool. One other question I have is how do you think that open weights versus proprietary foundational models like the GPT-4 versus the Llamas of the world, how do you think that they will perform in terms of capabilities? Do you think the Llama will ever catch up or do you think it's going to perennially be like six or eight months behind? How do you view that race? Oh, let's see.

Number one, I would say big kudos to Meta, Mark and his team out there for doing Lama. I think that really changed the dynamics of the ecosystem in a great good way for startups, a really good way for innovation. So I really appreciate that. Number two, I think is I don't think they will always be that far behind there. Number one, Meta is one of the biggest

because one of the biggest players that had that many GPU chips. There are very few people out there that have chips in their notes. I think it's hundreds of thousands of chips that they have. Very few people have that many chips out there to use. And if they are willing to keep doing this and keep investing in this, they have all the data as well, right? They have a lot of data they can train on. Image, video, text data, internet text data,

And I think they have all the capabilities and abilities to train these models. There is very little, I would say, secret sauce in how these things are done. And a lot more is in engineering challenges in scaling up the systems to train such big models in a way that's reliable, scalable, fast, and so on. So I don't think there's anything specific from a secret research point of view that they have to be six to eight months behind. Now, there's a lead right now in some of the labs.

That's definitely true, but I expect the lead to shrink over time in the long term. In fact, I would venture to even guesstimate that at some time horizon, open source might be faster than my set of lab closed source. Because as the compute, as innovation goes, I think the rate of innovation really depends on how fast people in the world, like in teams can try out different ideas, see what works.

And so even in the image model sense, when we saw early versions of models coming out, people were training what they call LORAs, which are low-rank adapters, smaller models. So we're mixing them and doing really interesting things across the community and sharing them out and saying, this is how you can adapt these things. And it was really fascinating and fast to see how that happened. And I think open source will have that kind of a flourish. Now, the hard part for open source in this world is that

the compute requirements to train a model are very high. So even though it's open source, it doesn't mean it's easy for someone to download it to their laptop and just work with it and try and experiment out, especially on the large model side. And I think that's going to be the part that makes it harder to keep up that piece of innovations there.

But there's nothing, I think, stopping open source from catching up in some ways to closed source models. I don't think there's any secret sauce in the world that they know something that the rest of the ecosystem doesn't know about. Yeah. Awesome. One last question. This is the final question.

So up until now, it's mostly been people using AI with publicly available information, right? Like the massive corpus, like every pirated book and movie and everything that's been dumped into every Reddit thread, every FreecoCamp help tutorial. We've got like, I think, hundreds of thousands of...

forum threads that are likely in a lot of these models and things like that. But it's been things that have been publicly accessible historically, and there hasn't been a lot of personal information that gets integrated into it other than what you feed in the prompt.

How much do you think the utility of these models is going to change when people start giving, you know, potentially read access, maybe write access to their calendars or to their email and things like that and being able to pull in a ton of personal information in addition to the massive corpus of information that's already in these foundational models? How much of a productivity unlock do you think that will be?

I think it's going to be massive. I think that's probably the biggest barrier right now to us using AI more. Because if you have to copy and paste something from your email, from your internet into ChatGPT to get it to do something, you're not going to do that. It's too much work. But if this is natively working with your data, with your email spreadsheets, your documents, Google Docs, Drive, and so on, with your PDF files, you just say, hey, go look at that file. Go find it. In fact, don't even tell me. Just go find it.

it's going to be massive, right? And I think right now we are just scratching the surface, I think, of how we can use AI on a day-to-day basis. And part of the reason is that it's not well integrated into our app ecosystems of how we use things, right? So to your question, I'm like, isn't it massive? And I think where we see that changing, right? It's like a lot of things we do on a day-to-day basis involves getting data from some system, maybe email spreadsheets and whatnot,

working with it, transforming it in some way that's meaningful, maybe writing a reply draft and then sending it out. And so now the more we can insert AI into those sequences in there, which is get data, use AI to help with the intermediate step, push it to another system, the faster we'll be at being productive, right? We can have AI start to take on a lot of the first drafts or the first tech steps and actions. It's much easier to

give feedback, edit, then to create, you know, much easier to be a critique than to create something. Yeah, absolutely. And so, yeah, exactly. So I think that's the core piece in there. And I think the more we can inject this technology into all our workflows, we're going to see huge productivity improvements. The hard part in this all, I think, goes back to...

the design of the systems around us haven't been, you know, we haven't designed systems for this, right? Our software isn't designed for AI to sit in the middle and do stuff for us. And so there's a lot of retrofitting happening right now to make that happen. But I think increasingly you'll be like, wait, what if we design it for AI first? How would that feel like?

And so I think maybe in coding, this is where we've seen AI and engineering take off, where there are IDEs like Cursor, which we also use ourselves in-house, where it's just natively part of the coding experience where you're coding. And rather than just be coding and say autocomplete, you go like, hey, I need to make this change across these five files. Can you go propose a change? You walk away, you get coffee, you come back. It's like, hey, look at the proposed changes. That's good. Accepted.

So the more it's integrated in natural ways, the more we'll use it. And I think that's the exciting part of the future. So the main unlock is...

is just reducing friction to use where you don't have to copy information and you don't have to explain as much because it benefits from the AI system benefits from the context of having read your emails and knowing to whom you're sending emails and corresponding and all this stuff. And it has all that in its kind of working memory. So assuming the context window is large enough, it can have a pretty good understanding of what your goals are and how you're going about achieving them.

And that can be a starting point for it to get things done on your behalf. Totally. And to recommend a course of action. And what you said is very important. It's much easier to critique than it is to create, right? There's the thing. Everyone's a critic. It's much easier for me to listen to some pop song on the radio and, eh, I don't really like this, than it is for me to create a pop song that I do like, right? That is like a massive undertaking, right? So I definitely think that's really cool. The notion that, like,

There's this huge unlock that's relatively easy to implement and is going to be rolled out over the next few years. I mean, we already talked about Apple with the Apple AI. Obviously, Google is one of the best positioned in terms of having access to a whole lot of people's data. I imagine Microsoft, like all these different tool ecosystems that people are doing business in, they have...

the potential to just drop AI in there and suddenly things get done a lot quicker. And of course, I'm going to set aside the privacy considerations and the security considerations right now because that would be a whole other discussion. But it is exciting to know that should we be able to strike a right balance with privacy and security, there's a tremendous amount of value lying on the other side of that in terms of just reducing the amount of tedium and boring kind of manager-minded work

that we had to do so that we can focus on, I guess, more deep work where we're actually creating stuff instead of corresponding about creating stuff. Yeah, I can see it being a big game changer. So...

I'm happy to hear you echo that sentiment and actually put a whole lot of additional detail on that feeling that I've had for a while, that it would be cool to potentially use AI in a very careful way initially, but to help get things done. I would say the security and privacy stuff, if you abstract it away and not view it as another human, like, hey, this is just a calculator.

you know, we don't worry that much about security and privacy a couple of days. That's right. Just a couple of years. And then, so what we need to think about is like, is this data going to be trained somewhere? Right. If not, it's literally calculated. It's like words in words out. That's our numbers in numbers. That's what it is. Right. I mean, you'd have to give it access to like,

passwords and stuff. I mean, you might be able to just do that abstractly through like a password manager. Yes. And they have a key and they use that too. They don't actually get exposed to the actual password. So I agree. There are lots of layers of, they're like fail safe and they're like, I guess, walls that you can put up to prevent. Exactly. I think it comes down to a lot of system design and like how do these systems work?

When they take actions, what they have access to. So for example, for us, when we design our system, Lutra works with OAuth directly. So it never actually sees your credentials. In fact, it never sees the keys. All it sees is like, I can get a spreadsheet. I can read it. I can update it. It doesn't see the underlying implementation or like, oh, this is how we get a secret key and token and use it. No, it doesn't see that. So a lot of the system design, when you do it right, it's actually very clean.

in that way. And you can actually have a lot of guarantees around what can the AI do? What can it not do? And I think that's all things that are still an early design phase right now for many people. Yeah.

Yeah. Well, it is hardening to think about because I think a lot of people are reflexive. They're like, oh, I'll never give AI access to XYZ. And you may not technically have to. You may be able to just give it a way to get through your password manager to what it needs to get as opposed to actually exposing your passwords. Or you may have given away a budget for it to be able to access your money so that it can go and make purchases on your behalf

but where there's still fine-grained controls and there's still you in the decision-making loop. In fact, I'll say that the UX there is actually really interesting and what we are starting to see is that rather than give it a budget, I think it's better for the AI to estimate how much things would cost. Come to you and say, I'm going to do X, it's going to cost about Y.

Are you okay with that? And you say, okay, go. And so it's actually like the flip side around, the AI should be smart enough to really figure out what's going to do and the consequences. And then come to you and say, these are the consequences, are you okay with that? Yeah, that's a great observation. So the more we can give people that feeling of they're in control, and they are in control, really, I think that's important. Yeah, well, Jichun, it's been an absolute pleasure.

grilling you on all these aspects of like AI development history, all the, the state of the art, the limitations, the promise, the peril, not, not as much peril. Like you and I are both very optimistic about these tools. Um,

And just the nature of the rate of improvement of different facets of AI, because I think a lot of people look at AI as just like a big monolith. It's actually a whole lot of different technologies, and a lot of it is not the actual AI itself, but the systems we build around the AI that makes all the difference in terms of actually getting things done and making life easier and better for humans. So I want to thank you again for sharing your expertise.

Cool. Thank you, Spencer. You know, this is amazing to be on the podcast with you guys and I'm so glad to share what we're up to and how we're thinking about this. Yeah. And as always, I've added some interesting links to the show notes or the video description if you're watching the video. And until next time, or until next week, I should say, because we are a weekly podcast. Happy coding, everybody. All right. Thank you again. Happy coding. And if you check out Rucho, let me know. It'd be great to hear your feedback on that. Thank you, everyone. Cheers. Cheers.

#156 AI Reality VS Speculation with Google Machine Learning Engineer Jiquan Ngiam 01:53:13 Share

freeCodeCamp Podcast

Deep Dive

Shownotes Transcript

#156 AI Reality VS Speculation with Google Machine Learning Engineer Jiquan Ngiam