We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

867: LLMs and Agents Are Overhyped, with Dr. Andriy Burkov

2025/3/4

Super Data Science: ML & AI Podcast with Jon Krohn

AI Deep Dive AI Chapters Transcript

People

Andriy Burkov

Topics

Andriy Burkov: 大型语言模型(LLM)在处理与训练数据相似的问题(in-distribution)方面表现出色，但在实际应用中，我们难以判断业务问题是否属于此类。因此，LLM在生产环境中可能出现无法预测的错误，导致用户不满和声誉受损。在将基于LLM的系统投入生产环境之前，必须谨慎评估其风险，因为这可能导致声誉受损和客户流失。多智能体系统，特别是基于LLM的系统，难以调试，因为各个智能体独立运行，难以同步调试。LLM作为黑盒，无法进行内部调试，因此基于LLM的多个智能体协同工作时，调试难度极高。要创建可靠的、可用于生产环境的多智能体系统，需要类似于人类水平的通用人工智能(AGI)，但目前尚未实现。实现AGI的关键在于理解人类能够进行无限期规划的机制，这与其他动物不同。要实现通用人工智能(AGI)，需要理解人类无限期规划能力背后的机制。实现通用人工智能(AGI)可能需要更复杂的模型架构，例如模拟人脑不同模块的功能，而不是简单地扩展单一架构。要避免聊天机器人产生幻觉，关键在于不使用大型语言模型生成输出，而是使用其他方法，例如检索增强生成(RAG)或预定义模板。DeepSeq通过降低训练成本、公开其方法、降低推理成本以及消除对人工专家在训练数据创建中的需求，彻底改变了大型语言模型领域。大型语言模型的“开放权重”并不等同于开源，因为要完全复现模型，不仅需要权重，还需要训练数据。大型语言模型在机器学习项目生命周期中起着双重作用：它们可以用于快速原型设计，但对于生产环境中的关键组件，仍然需要传统的机器学习开发流程。 Jon Krohn: (主要为引导问题和总结，此处不展开)

Deep Dive

Shownotes Transcript

Translations:

中文

This is episode number 867 with Dr. Andrei Berkov, Machine Learning Lead at Talent Neuron. Today's episode is brought to you by the Dell AI Factory with NVIDIA.

Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week, we bring you fun and inspiring people and ideas exploring the cutting edge of machine learning, AI, and related technologies that are transforming our world for the better. I'm your host, John Krohn. Thanks for joining me today. And now, let's make the complex simple.

Welcome back to the Super Data Science Podcast. Today's episode is one not to miss with the super famous machine learning author, Dr. Andrei Berkov, who very rarely does interviews. Andrei wrote the indispensable 100-page machine learning book that seems to be on every data scientist and ML engineers bookshelf. His artificial intelligence newsletter is subscribed to by nearly 900,000 people on LinkedIn. That's insane.

He's the machine learning lead at Talent Neuron, a global labor market analytics provider. He runs his own book publishing company called True Positive. He previously held data science and machine learning roles at Gartner, Fujitsu, and more. And he holds a PhD in computer science with an AI specialization from the Université Laval in Quebec, where his doctoral dissertation focused on multi-agent decision-making 15 years ago.

Andre's latest book, The 100-Page Language Models Book, was released a few weeks ago and has already been met with rave reviews online. I will personally ship five physical copies of The 100-Page Language Models Book to people who comment or reshare the LinkedIn post that I publish about Andre's episode from my personal LinkedIn account today. Simply mention in your comment or reshare that you'd like the book. I'll hold a draw to select the five book winners next week, so you have until Sunday, March 9th to get involved with this book contest.

Despite Dr. Burkoff being such a technical individual, much of today's episode should appeal to anyone interested in AI, although some parts here and there will be particularly appealing to hands-on machine learning practitioners.

In today's episode, Andre details why he believes AI agents are destined to fail, how he managed to create a chatbot that never hallucinates by deliberately avoiding LLMs, why he thinks DeepSeek crushed Bay Area AI leaders like OpenAI Anthropic, and what makes human intelligence unique from all other animals and why AI researchers need to crack this in order to attain human-level intelligence in machines. All right, you ready for this tremendous episode? Let's go.

- Andrey, welcome to the Super Data Science Podcast. I've been trying for years to get you on, so it's a great delight for me to finally have you on the show. Andrey, where are you calling in from today? - Hi, John, and thanks for having me. I'm calling you from Quebec City, Canada.

Very nice. Quebec City used to be, so I grew up in Toronto. And when we were in high school, as we started, as people started to turn 18, the legal drinking age in Quebec is 18. But in Ontario, it's 19. So we would organize, we'd have, in Ontario, we have March Break, which is this

it's a week off in the middle of March and everyone would organize like a dozen buses from the high school to drive from Ontario to Quebec City so that we could take advantage of the 18-year-old drinking age there. They accepted your Ontario...

ontarian ID cards as proof. Yeah, of course, of course. Yeah, my daughters are like... I have two daughters, 18 and 17, so the oldest, she counted days before she can enter the alcohol store and just order whatever she wants. And she...

Folks, I know that they are very proud that here they can consume alcohol starting 18, but in the U.S. it's starting 20, I think 21. 21, that's right, exactly. Yeah. So then you get people from Vermont driving up to Quebec because they have the three-year gap there. It's an even bigger deal. University students taking advantage of that.

And yeah, but I don't think I've been back since. That was kind of that some time ago, 20 years ago now that I was taking advantage of that. But I haven't been back, which is too bad because Quebec City is beautiful. It's I think it's the only city in North America that has that European kind of vibe because it still has the original walls from when there was kind of when you had to have walls to protect yourself from the Americans. Yeah.

From the Americans or whoever wants to conquer the territory that you conquered first. Right, exactly. That's right. Or take back the land that's rightfully theirs, perhaps. Yeah.

But maybe that wall is going to come in handy again because of the rhetoric coming out of the U.S. recently. Yeah, it's nice. In Montreal, there is a small part of old Montreal where it also feels quite old. But yeah, this wall around the downtown, it makes it special.

like it's a lot of restaurants people walk on the streets like cars are very rare so it's it's nice and especially now during winter they all always eliminated you know they they put some installations like sculptures from ice illuminated with different colors so it's really really like kind of like a postcard you take pictures and you can send them wherever you want like postcards

Very Instagrammable for all our listeners looking for that perfect Instagrammable place in winter. Although I might recommend visiting in the summer when it's nice and warm. Yeah.

Nice. So you have either option. So yeah, so listeners, this message has not been sponsored, supported by the Quebec Tourism Board, but we do highly recommend checking out Quebec City for a unique city in North America. All right. So, Andre, on to topics that maybe more directly interest our audience.

You have over 15 years of hands-on experience in automated data analysis, machine learning, natural language processing. You're currently the machine learning lead at a company called Talent Neuron. That's a data platform for global labor market intelligence. And so it helps businesses make workforce decisions. However,

you are best known for your best-selling trilogy of concise machine learning modeling books. So there was the 100-page machine learning book, which is, I see that constantly all over the world on people's bookshelves, people who are data scientists or machine learning engineers. I see your first book, the 100-page machine learning book, on their bookshelves. And now, your latest installment, the 100-page language models book, is out. It's just a month old and

And in that new book, in the 100-page Language Models book, in the preface, you describe how your interest in text developed and how the complexity of extracting meaning from text fueled your determination to crack it. So can you walk us through your interest in language modeling today?

and how close we have come to cracking it. Yeah, well, I just should correct one thing. So it's not a triology or triology, sorry. Yeah, trilogy. So it's like a duology with a spinoff because I started with the 100-page machine learning book and I didn't expect to write anything else because I

I thought that it would be kind of a trick if I write a hundred page machine learning book and then I continue to write about machine learning and it will be like, okay, yeah, so you, it's not really a hundred. So actually it's like, uh, uh, much more than that. So, um,

But then COVID happened and I just looked for some project to do because we all stayed home and had nothing to do. So I decided to write a kind of a machine learning engineering book. So it's not about machine learning, but more like how to apply it for solving real business problems.

large language models happened. And large language models, it's like, for me, it's like a totally

different story. Like, yes, it's still machine learning, but it's so important on its own. And there is so much new developments, both scientific and engineering, that happened during the last two years since ChatGPT was released that I

like it would be wouldn't be a trick to write a book just about language models and what i also wanted to avoid is to write just another book on llms because like if you go to amazon today uh you will see like maybe dozens maybe even hundreds of titles with llms so i wanted to show the progression of the field like how language models evolved uh because like probably 99 of people

heard about language models just two years ago because of ChatGPT. But the science around language models existed since the 60s of the last century. So people always try, I mean scientists always tried to create some algorithms that would allow a machine to communicate like a human.

And the most successful approaches in the past were what we call count-based language models. These are basically just statistics. So you take a large text collection and you take what we call n-grams, like sequences of words, and you count how many times the word horse preceded by an n-gram, she rides a.

So, and you say, okay, she writes A is followed by horse hundred times in this collection. And it follows by car, I don't know, 70 times. So if you want the machine to generate a text, it will just find the word that is the highest count and it will generate she writes a horse.

So this approach was quite successful, but the problem with it was that it scaled very poorly because you needed to calculate all possible statistics for all possible engrams. And if you want your model to be accurate, your engram should be long. For example, you should be able to like today, which you put a

a thousand words as an input and you want the model to generate the next one. So if you want to count all possible n grams of thousand words, it grows very fast in volume. So these count models, by the way, they're still used in our smartphones. For example, when you type something to your friend and let's say you type often like, okay, what do we do this night?

So it will remember that what do we do this, and it will remember that night follows quite frequently these words, and it will suggest you as the first option. So for this, neural networks aren't used. This would be an overkill, and this would be really slow because you would have to retrain the neural network every time. But count-based models on very short context, they work very well.

So my book is like, it's a history of where it started and then where we are today. And my personal fascination with the topic started because I started to work...

with the internet in 1998. So I was 18 years old. So the internet was like really kind of a new thing. Very few people actually used it. And even in my city, in Sevastopol, find a...

uh, phone line that is, that doesn't have noises, you know, uh, previously you had to dial in, uh, the, the, the modem that, uh, you have to dial in. So if there are noises, the connection drops. So just for me, uh,

The company that operated landline phones, they, well, not just for me, but, you know, for some group of people who needed this kind of stable connection, they created a special landline so that we can connect. So, yeah, and when I connected to the internet, for me, the obsession was like, there is so much information, but you really have to, you know, extract it manually. So you go to one website, you read it or you copy something, you save it.

So I thought that if we can automate this process, that this would create kind of an interconnected automated information exchange.

So, yeah, and I started to create some kind of scrapers, so like a robot that can go to some website, detect that something new happened, some new information appeared, extract it and send me an email. Like, okay, this information, for example, I was interested in games and there was a website that published some new articles about how to solve problems.

this game or the history of this specific game series. I was interested to know how it works, but they didn't have any, you know, alert like, "Okay, you subscribe, we send you emails." So I had to really scrape the additions to their website and this is how I started.

This episode of Super Data Science is brought to you by the Dell AI Factory with NVIDIA, two trusted technology leaders united to deliver a comprehensive and secure AI solution customizable for any business. With a portfolio of products, solutions, and services tailored for AI workloads from desktop to data center to cloud, the Dell AI Factory with NVIDIA paves the way for AI to work seamlessly for you.

Integrated Dell and NVIDIA capabilities accelerate your AI-powered use cases, integrate your data and workflows, and enable you to design your own AI journey for repeatable, scalable outcomes. Visit www.dell.com slash superdatascience to learn more. That's dell.com slash superdatascience.

Oh, that's a lot of history with doing it. Today, you have scaled this up tremendously. You're not having to worry about noise on a landline anymore. At Talent Neuron, you're collecting over 700 million job posting data points daily. And then you're using language modeling methods to deliver insights, which you've said previously, 95 million daily predictions per

based on 700 million job posting data points. And obviously each of those job posts has a huge amount of information on it. So can you tell us a bit more about that work? Obviously you can't go into maybe proprietary details, but kind of linking the

the history of what you've been doing, what you cover in the book to what you're doing today? - Yeah, this is why actually I, well, by the way, I'm working at Talent Neuron, but I didn't join Talent Neuron. I joined the company whose name was Wanted Analytics, and this was a local Quebec City company that was created by a local, Jan De Lille, who is an entrepreneur here from Quebec City.

And when I joined, we were about 40, like probably 30 people. And two or three years later, the company was acquired by a multinational American consultant called CB.

And we were under CB probably for another year or two. And then CB was acquired by Gartner. So then we were under Gartner for another two or three years. And now Gartner, well, two years ago, Gartner sold

our business as a separate entity and this is how I end up working at TalentNode. So basically I changed four companies without changing my chair. This was quite funny. Yeah, so the product that we have is based on, like, the goal is to provide actionable insights to

people responsible for recruiting or workforce planning in enterprise. So how we do it is that we have robots that go to different

corporate websites, job boards, some aggregators, applicant tracking systems, those sources that contain jobs. We call them jobs, like it's job posting when you look for a job, you open something and it says like the title is this and this is the description, the conditions and so on. So we download these postings daily

So we have about, I think today, about 35,000, 40,000 such robots. And these robots, they are not intelligent. They are fixed. So basically, they know how the website, specific website works. So they know what to click to go to what page. They detect that something changed and they download based on some predefined rules. And for most part of my work,

So my history in the company, my team was responsible for post downloads. So when the job is already in your database and you have to normalize it or extract something from it. So we worked on all sorts of projects. For example, a typical job posting contains skills.

So we created a system that detects different skills in the job posting, extracts them, normalizes them, because sometimes it's written, for example, I don't know, JavaScript script. Sometimes it's JS, sometimes it's JavaScript one word. So we need to detect that it's all the same skill.

Sometimes it's funny when they say you must excel in Word. So in this case, Excel is not actually a skill, but Word is. So there are plenty of interesting NLP, natural language processing challenges here.

So, since the first language models have been released and probably you know about BERT, Roberta, like it's the first transformers that Google, well, Google released BERT and then Facebook released Roberta. So, these were language models like of what we call encoder language models, so they cannot

talk. Okay, they cannot be used to generate text, but they understood text very well. So we adopted this first transformers probably somewhere around 2018, 2019 to, for example, predict the industry. So like the

the job talks about the company and from the company description you can the the model can predict what industry it is or those models also are good as classifiers so if we want to distinguish excel in one context versus excel in different context we can train such a such transformer to actually read not just the word excel itself but its surrounding in the text

and make a prediction. Then it's all become multilingual. Like previously, historically, machine learning was monolingual. So if, let's say, you wanted your

We had a salary extractor, for example. So we supported about 25 different languages. So for each language in the past, we created a salary extractor for this specific language. So we really needed to label examples in Chinese, in Japanese, in Russian, and so on. But now today, it's all so much simplified. And again, it's just very recent trend that most of models that are being released

They are now multilingual. And it's like, for me, like even five years ago, probably I wouldn't be surprised, but 10 years ago, like if you tell me that, you know, the same model can accept whatever language it is and output whatever language you want, you would say, no, it's crazy. It cannot work like this. But today, this is what we have. And now, for example, for extracting the salaries for every new countries that we add,

We don't label the data for these new languages by hand anymore. We don't label probably at all. We just reuse already labeled data for other languages to train the model for this new language because like a salary, it's a salary. So no matter what language you use to describe it, it's still a salary. And the models are really strong now to generalize across languages.

So this was like post-download, but now we also work a lot what we call like before downloads. For example, those robots that I mentioned that come to some website and find the jobs to download. So now we work on a system that builds these robots automatically from scratch. Basically, you just say, here is the home URL of a corporate website.

Go find the career section, identify where their listing, like where all the links to the jobs are.

find create rules to extract different elements from this listing for example the title the location the the posted date and then click on those links visualize the the description and extract the description itself and if it contains any any additional attributes like industry or perks and so on also create rules for extracting this so now we we don't

have to create those scripts or robots manually because historically again it was very difficult for our software engineers to automate every specific website because all websites are different programmed using different technologies by different people. Some are programmed well, some are less well, some are really ugly. So our developers really struggled to create those scripts and

especially fixed, you know, something small changes on the website, the robot is no longer capable of gathering information. So you have to open the script, look inside what happened, why it's no longer working and so on. And now we with this automated script creation, we can automate at least at least half of the websites like fully and for the other half,

It's still manual work, but those websites, we consider them challenging. And challenges are usually more interesting for humans to solve than just something routine. Yeah, you're giving a great example there of the kinds of capabilities that the large language model revolution is unlocking for us. There's so many things that now can be done in an automated way. Like you're saying there, half of the...

half of the time you can identify on a website automatically what the formatting should be like and download it as opposed to having to hand code it. And I'm sure that that percentage is going to go up and up over the coming years. Well, yeah, but the result of excitement, I think probably, well, you definitely follow this media trend

craze and hype around AI, automated agents that they will solve problems for us. As I mentioned, we have an extensive experience now in trying to use those LLMs out of the box.

to help us in organizing or extracting information. And it's not as beautiful in reality as it is in presentations. And the problem is that the LLMs are really good at the problems that we call in-distribution. And in-distribution means that the problem that you ask it to solve, it's similar to what

to the data that the LLM saw when it was trained.

And the problem for businesses specifically, maybe for particulars it's less of a problem, but for businesses the problem is that we don't know what is in distribution and what is not. Because to know it, you actually have to have access to the entire dataset that was used to train the model. And this dataset is hidden. Like probably with a couple of exceptions,

All LLMs, including those that we call open weight, they are not really open. So you can use them. Yes. You don't even have to pay them often. Yes. But you cannot really tell whether your specific business problem is in distribution or not. So you can test.

Let's say you develop a system based on LLMs and you provide some tests, some inputs, and it's kind of cool. You give it a problem, it solves it, you see the solution, makes sense.

But when you put it in production and real situations start happening, not those that came from your head that you decided that it would be a good test, but real data. And this real data may not be aligned with how you think

what you think the data in production will look like. And in this case, the LLM can become arbitrarily wrong or make wrong decisions, output wrong information. And because we don't really have any detection mechanism,

And of course, we don't have any prevention mechanism. So when you put systems that are based on LLMs in production, this is where the nightmare starts. Because while you are coding, testing in your controlled environment and you are happy, everything is good. But then you put it in production. What does it mean in production? In production means in front of the users.

And this is where you start getting troubles and users get angry and you say, okay, we will fix it. But you have no idea how to fix it because you're blind.

You know that it doesn't work, but you have no idea what to do to make it work because how close it is to the distribution. Maybe it's just close enough and you will just add a couple of examples to fine-tune it and that's it. Or maybe the use case was so far from the distribution that no matter how many examples you give, there will be always cases around this place.

that will still not work. So this is where I really recommend anyone with the decision power listening to this podcast is to think twice, maybe even three times or maybe 30 times before you actually decide to put something LLM-based in front of your users because the

It might sound cool, like, oh, look, we use LLMs. But then they lost reputation and angry customers. This is not something you will find cool.

AI is transforming how we do business. However, we need AI solutions that are not only ambitious, but practical and adaptable too. That's where Domo's AI and Data Products Platform comes in. With Domo, you and your team can channel AI and data into innovative uses that deliver measurable impact.

While many companies focus on narrow applications or single-model solutions, Domo's all-in-one platform is more robust with trustworthy AI results, secure AI agents that connect, prepare, and automate your workflows, helping you and your team gain insights, receive alerts, and act with ease through guided apps tailored to your role. And the platform provides flexibility to choose which AI models to use.

Domo goes beyond productivity. It transforms your processes, helps you make smarter, faster decisions, and drive real growth. The world's best companies rely on Domo to make smarter decisions. See how you can unlock your data's full potential with Domo. To learn more, head to ai.domo.com. That's ai.domo.com.

Yeah, digging into the point that you just made about some of the limitations around agents, you emailed to me ahead of us recording this episode. I was actually, I was caught off guard. You said something like,

Agents won't fly. And in this time where everybody is talking about agents, so just as some examples, Jensen Huang, the CEO of NVIDIA, said recently at the Consumer Electronics Show that this is the year, 2025 is the year of AI agents. Salesforce CEO Mark Benioff is equally bullish, predicting AI agents will take over the labor force soon.

Andrew Ng, Andre Karpathy, they both say that agentic AI will revolutionize labor and is paving the way to AGI. But so when you and I were discussing over email potential topics that we could cover on the show, you said to me, AI agents won't fly. And so I was so surprised by that. I mean, I knew that you might mean they won't work.

Yeah.

And then you went on to say that while more than one agent working together, a multi-agent system is undebugable. So, I don't know, do you want to dig into this a bit more? Well, I have a couple of comments. First of all, Karpati, I respect him very much, and

he is very cautious in his choice of words. So he never says something like, okay, agents will replace humans. He posts his ideas like how it might be, but he never predicts that it will actually happen. But those who you mentioned, like, for example, the Nvidia CEO, well, we should understand that these people, when they speak, they don't speak, you know,

their heart. They speak as representatives of a huge company that should be responsible in front of their shareholders. And if saying something increases their share value and it's legally permissive,

They do it. So he knows that if you say that agents this year will become huge, it will mean for investors that you need to buy more GPUs because if everyone runs agents and you don't have GPUs to run them, then you lose. So saying something like this just works well for his specific business.

Salesforce, the same thing. When they started, they said that, okay, traditional software is dead.

Now it's software as a service and they even had a logo where it says a software and like Crossed like okay, let's go on but traditional software Didn't go anywhere. Yes, there is a lot of sass but there is a lot of traditional software too and Now they sell they say okay. Well sass is dead now its agent Will it happen?

I really doubt that. Yes, some use cases, agents will be probably good. Some use cases, again, if we look into this in distribution versus out of distribution. So the best use case you can imagine for an agent is the agent that gathers information.

like, well, like cross the web, find some interesting documents, some relevant documents for your business, for decision making and extract it kind of aggregates into some report and sends it to some decision maker. Why it would work is because LLMs were trained on the web data. So for them, web data is the closest to industry.

to in distribution that you theoretically can get. So of course, if you say that my agents are agents that crawls the web and, and, and

extracts pieces of relevant text, yeah, why not? It might work. Is it a huge use case? Do everyone need agents that crawl the web and extract relevant information? Some might, some probably not. Some might say, I can just Google the information that I need. Or I have, for example, I have Google Alerts about every time someone mentions my book online, I receive the alert.

Is it an agent? Someone might say it's an agent, but it's just a cron job that runs a search on Google index. So these people, they are interested in promoting their business, and this is what they say. But

Talking about multi-agent system, my PhD was in multi-agent system. And so if I understand something in AI, one might say that it's multi-agent system. So the biggest challenge with multi-agent systems and any distribution systems is to debug them. Debugging is hard because it's a multiple, like when you debug a typical software,

Like, I don't know if you have an experience coding. For example, some function doesn't work or like the code enters this function and then it crashes. So what do you do?

You run the debugger, you say you put a breaking point in your function and you run the code. The code runs until it reaches the breaking point, then it stops. And then you have this next, next, next, like a step by step where you can execute each command or operation one by one. And by doing this, you also can observe how the values of all variables in the environment, how they change.

And this is how, as a human being, you can detect that something is wrong, and this is how you will update your code. Now imagine that you do this for one of your agents.

and then 25 others still do something. And you cannot stop all of them at the same time, okay? Because all of them are independent pieces of software, so they will still be operating while you try to debug one of them. And this is when you actually control

But with LLM-based agents, you don't control them at all. You cannot debug an LLM. An LLM, it's a neural network. It's a black box. There is nothing to look inside to say, oh, why the information flows this specific part in the neural network? This is not how it's done. It either works or it doesn't. So imagine if you have...

20 or 50 such agents and when they especially interact to one another because like it's one thing to debug you know 25 independent agents but when they all collaborate together to provide some final result it's crazy and debugging a distributed system is difficult because of this so it's not it's

asynchronous. So it's like every process runs independently of every other. You really cannot really kind of stop the whole system and debug it. So this is why I'm very much skeptical about agents. As I said, for some very specific use cases, it will work. But imagine you have agents that should navigate your

intranet, not internet, but your intranet with all those legacy software that you have. You have software that contains your employees' salary and performance and so on. You have a software with SharePoint with some outdated information. You have your Git with code. You have your documentation, everything. And you put agents in there,

And they don't know anything about any one of your internal systems. They see them for the first time. And you think that just by telling them, okay, you are a helpful, intelligent agent. You can walk through different applications in my intranet and find issues and fix them. Come on, let's be realistic. They will break their teeth quite fast.

All right. Yeah, that's a compelling argument. So our researcher, Serge Massis, he had a question for you that, you know, maybe this is a tricky question because maybe there isn't an answer. Maybe

Multi-agent systems just can't be debugged. But, you know, given your PhD in this, maybe you have some insight into some kind of alternative architectures or design principles that we could use to create robust and maybe even interpretable AI systems for complex tasks.

Well, I think if we are realistic, okay, so any multi-agent systems must give you 100% control over every actor in the system. Well, we can call them agents, okay? So if you can control every agent, you can design a specific schedule and specific communication interchange protocol that will allow you to detect agents

bugs or misbehavior. So for example, you can analyze how agents exchange information, what was in those packages and detect that something was abnormal.

So with LLMs, as I said, these are black boxes, so you really have zero control over how they think, how they make decisions, and so on. So I think that if we want to be realistic that in the future there will be some agents that will do job for us and we can sleep the night without worrying that they will launch some nuclear power plant,

enter some nuclear codes and launch nuclear missiles. I think, yeah, it should be something similar to what they call AGI, like artificial general intelligence, where

at least we can trust this AGI the way we trust a regular human being. And you know that in security, for example, you want to secure some important object or you want to control access to some important briefcase,

There is never one person, okay? And often even we saw in movies like to just open a door, two people must be on a significantly long distance from one another so that one person cannot use two keys and then two people must turn these two keys at the same time. Why this is done?

This is done because we as humans are unreliable. Okay. So if we want something secure, something stable, something we can sell to our customers and, you know, say it's a good stuff.

It should be as reliable as a human-based system. But today, no one will argue that those agents that we talk about, they're nowhere close to be as reliable as a human. So until this happens, building multi-agent systems with such agents, it's a recipe for disaster. So somewhere in the future, we will have this AG platform.

AGI and we will see if we can trust it and if we can create systems similar to human-based systems with, you know, these additional levels of security by doubling, tripling people, but not today. And as far as I know, today there is no one having a clear idea how to reach this future with these AGIs being the real thing and not, you know, something from science fiction.

Eager to learn about large language models and generative AI but don't know where to start? Check out my comprehensive two-hour training, which is available in its entirety on YouTube. Yep, that means not only is it totally free, but it's ad-free as well. It's a pure educational resource. In the training, we introduce deep learning transformer architectures and how these enable the extraordinary capabilities of state-of-the-art LLMs.

And it isn't just theory. My hands-on code demos, which feature the Hugging Face and PyTorch Lightning Python libraries, guide you through the entire lifecycle of LLM development, from training to real-world deployment. Check out my generative AI with large language models hands-on training today on YouTube. We've got a link for you in the show notes.

In your work as both a real-world AI system developer as well as through the books you've written and recently this huge amount of expertise you've developed in language models to write this book on language models, you probably have an interesting perspective on AGI and when it could be realized. You just mentioned there that we might have it in the future. Do you want to hazard any guesses in terms of timeline? When I say that we may have it in the future...

It's like to say we may have teleportation in the future and it might work. So, yes, it can work because if we humans, we are conscious, then something in nature changed. And I mean, changed compared to our predecessors.

So we evolved somehow in humans. Because what is the difference, the biggest difference between humans and the rest of animals? Humans can plan over an infinite horizon. So some monkeys, like chimpanzee and the most developed ones,

they can use tools because previously it was considered that only humans can use tools. But now, after decades of research, we know that even some birds can use tools. For example, I think that it's crows that have a nut and they can throw it from a height.

and it falls and it cracked and even when they live in the city they can wait for a car so they wait for a car then they throw the nut the car walk over the nut and the nut is broken so they use tools

Some monkeys can even use tools. So for most animals, they use tools like in this specific moment. So they will not keep their tools for tomorrow. But some monkeys will actually, like you, for example, you give one monkey a stick.

and only with this stick she is able to get a banana. So she will get a banana and when she goes to sleep she will put this stick

under her belly when she sleeps because she knows that tomorrow she will also need a banana. So this means that some animals can plan one day in the future, two days, but if you remove bananas for more than three or four days, it will throw away the stick. It will not think that maybe in five days bananas will be back. But humans...

will think that I will still keep this stick because who knows. And we can even plan like many years, like even hundreds here, thousands here. Today we think about saving the planet. So we think about reducing the consumption of plastic and we think about like the...

global warming issue. Why do we do it? Like, we will die maybe in the next 60, 70, 80 years. The planet will be still fine. We do it for the next generation, for our kids, for their kids, and so on. So this is what we managed to gain somehow through evolution. So now the question is, how can we get this AGI? So basically, the answer is,

Like what inside us is different that makes us planners for infinity versus every other living creature on this earth? If we can answer this question, I think this will be probably the biggest breakthrough because this is something that our LLMs or whatever neural network you talk about, this is what they don't have.

they don't have the ability to actually plan. So they are reactive. You ask a question, it gives you an answer. Even if you call it agent, they don't really have agency. It's because they might act as agent because in the system prompt you said,

You are an agent and your goal is to provide your users with the best information on a specific topic. But this agency didn't come from the agent itself. It came from you. So you instructed it to be an agent. And because the LLM doesn't really understand what it does, it just generates text.

sometimes this agency will be violated. So it will not do what you want it to do and you cannot really explain why. So it's like a black box, it works or it doesn't and you don't know why. So if we answer this fundamental question, what makes us planners for infinity,

I think that this is where we will get one step closer to AGI. - Yeah, I would suspect that some of the answer lies in our prefrontal cortex and the ratio of prefrontal cortex that humans have relative to other primates.

that allows us to kind of maintain a loop through our other sensory cortices over an extended period of time, which brings me to a point that I've talked about on this show before, which is that it seems to me, and it sounds like it may be the case for you as well, that cracking AGI may require modeling the neuroanatomy of the

of a brain, of a human brain perhaps, in a more sophisticated way than just scaling up a single kind of architecture like a transformer. That we might need to have different kinds of modules so that we have something like a prefrontal cortex that can be doing this kind of infinite horizon planning that you're describing. And so you'd have to, you have different parts...

that are connected by large connections kind of pre-planned as opposed to just allowing all of the model weights to be learned in a very fine way across the entire cortex in the same way, across the entire neural network in the same way. Yeah, and it's not only, well, I simplify it a bit by saying that this is just one thing that will make us different. But another thing is

that we also have and LLMs, for example, don't. Is that humans somehow have a feeling about what they know and what they don't know. Okay, so for example, I ask you about, I don't know, about astronomy, okay? Or about the universe, stars or galaxies.

And if it's not your domain, you will tell me, you know, Andrey, I like to talk about these topics, but if for you it's something critical, you probably should talk to a specialist because I can only tell that, you know, planets are spins around stars. This is what I know. But LLMs don't have this mechanism to detect that

what you ask about wasn't part of its training data or it was, but not in the level of detail granular enough to make a to to make a to have an opinion that it it's worth sharing. So it will still answer you. Like, for example, I made a test a couple of days ago with this O3 Mini

from OpenAI, I wanted to see, like, because...

All models, all LLMs, they have been trained on the web data. And on the web, there is a lot of information about my first book. But my third book just came out, so there is really few information. And I'm sure that their cutoff was earlier than the book was released, so they should not know anything at all about it. So I asked O3 Mini, like, is my 100-page language models book good?

And what is interesting is that previously you couldn't see this. Currently, they show this what they call a chain of thought, like this internal discussion before they provide an answer.

And I read this chain of thought and it's funny. It starts by saying, okay, so he asks about this book, but this book looks very different from the previous one. So probably it's some new book. Okay, what do I know about this new book? Not much. Okay, so what do I know about the previous book? Oh, the previous book is XYZ.

So this discussion, and then it starts releasing the final answer where it just says that, yeah, this new book is very good. It's praised by experts and by readers, and it delivers content in a very good way. And I'm like, where does it come from? It just made up.

the recommendation, and it's based on its internal discussion in which it says, "Yeah, but I don't have anything about this book, but given that Burkov has a great reputation, this is what I might say." But it doesn't tell you in the official answer that it's pure speculation.

It answered this just as if it was like a real deal. So this is where, you know, the LLM cannot really understand this difference between I'm sure about this. I am less sure about this. I'm totally...

I can be totally wrong. So again, if we can solve this, this will be additional next step to AGI. So the model that can be reliably self-observing and self-criticizing. So saying that I would love to help you, but here I feel like I'm in the domain where

I cannot be reliable. This is, and by the way, they, they, they try, they try to fine tune models to tell this, um,

But it doesn't work this way. So basically, for example, some of some of models, especially released by when like the Chinese Chinese company. So they they decided to fine tune their models to say, I don't know this person. So previously, for example, you can for example, there is information about you online. So you can ask a model who is John Krohn.

And it might say, well, he's a podcast host, book writer. But it might also say that you are a Ukrainian footballer like me.

So to avoid being, you know, ridiculized. So like people Google them, people ask about themselves. They know that information, some information is online, but it comes out totally made up. So they decided that they will fine tune their models to say, I don't know anything about this person. And they fine tuned it on by giving the names of really famous people,

and they say, "Let's go answer." And then they give some just random names, people who don't exist online or very small footprint. And they say, "Answer, I don't know." But it's funny because I ask, "Who is Andrey Burkov?" It says, "First time I hear this name, don't know anything." And then I say, "Who wrote the 100-page machine learning book?" "Oh, it's written by Andrey Burkov." Like, "You just told me that you don't know."

So no, they try, you know, to create some hacks around it, but it's not really training a model to recognize where it can be wrong. I've noticed a related hack recently in cloud outputs where

It's something you can tell it's probably not directly a part of the core LLM, but again, something that they've tacked on top where I'm now frequently seeing in cloud responses things like, this is a relatively niche topic. I don't have that much information. You might want to double check this. And I find that they're being really conservative with that where I'm getting that frequently on questions that I ask on things that I don't think are particularly niche.

And so maybe there's some fine-tuning that they need to get right there. That kind of problem seems like something that these big LLM trainers are working at, and they're probably all taking different kinds of approaches. You have actually...

you wrote on LinkedIn that you developed an enterprise chatbot that doesn't hallucinate, which seems related to this. So yeah, hallucination, having this kind of confidence about things that the LLM doesn't know anything about, it seems like you've achieved something here. So how did you accomplish this? Well, yeah, so the only way to make a chatbot not hallucinate

is not use an LLM to generate the output. So

We all know that RUG, this retrieval augmented generation, decreases the level of hallucinations quite significantly. So, for example, if you ask about machine learning and you pulled the data from Wikipedia, the machine learning article, and you answer the user's question based on this Wikipedia article, then the chances that you will say something entirely wrong

are quite small there are still chances uh but quite small for example I don't know compared to just answering out of the box without doing any retrieval maybe you will hallucinate 20 30 percent uh of time but with retrieval augmented generation maybe one two percent so it's still there but not quite

a lot. So what we decided to do is we decided to exclude any possibility of hallucinations. So basically our chatbot, it's not the open domain chatbot. This is a very big advantage for any machine learning team when they work with closed domain versus open domain. So like for example, OpenAI, Anthropic, Google, Gemini,

they all work in open domain. There is zero chance that they can create some kind of templates, kind of fixed templates for every possible kind of answers.

But if you work with the closed domain like ours, and our SaaS can answer users' questions. For example, what are the top skills for a Java developer in Chicago? Or what is the difficulty to hire a registered nurse in San Francisco?

So all this information can be directly pulled from our internal APIs. For example, you provide the occupation, you provide the location ID, and you call the API about salary. So it takes your occupation, it takes your location, it pulls from the index the distribution of salary distribution, and then you just...

show it to the user. So what we decided to do is that we decided to create a set of predefined templates. For example, okay, so you look for an average salary for a nurse in San Francisco. The average salary for a nurse in San Francisco is, and then there is a placeholder for a number. And we pull this number from the API and we show it. So

The possibility of hallucination here is zero. There is a possibility of an error in how we interpret the user input. But because we always show all our interpretation, for example, let's say the user says someone with JS skill. So before we show any number to the user, we need to normalize, we need to convert this JS

into our internal skill taxonomy. So we take this JS and we use our internal skill normalizer and it says, okay, JS, it's skill number one, two, three.

So we show the user, okay, so you look for someone with JavaScript skill. So the user sees exactly how their input was understood by a machine. And then the user sees the output and the output also comes directly from the database. So there is no hallucination. It's when you see some number.

And you are not sure whether this number represents what you asked for or it represents something arbitrarily different from what you asked for. But in our case, because it's a closed domain, we say, okay, so occupation code A, skill number one, two, three, location, it's San Francisco, California, US. It's all shown in different, we call them like pills, like they're all normalized kind of labels that you see.

And then you see a number. So yes, the number can be wrong, but the number can be wrong not because we made it up, but it's because the distribution of jobs corresponding to your surge

doesn't reflect the reality. But you would get exactly the same wrong answer if you use the system directly using the traditional UI. So there is a one-to-one correspondence between what you see in the chatbot and what would you see on the platform if you didn't use the chatbot. So this is what we call zero hallucination. But of course, errors

Errors can always be there, but some errors we can control, but some they just come from the data that we gather online. And this data is never 100% perfect. Right. Of course. Yep.

So that is an interesting approach. So avoiding the LLM in order to avoid hallucinations. Yeah, but we use LLMs in the process. So we use LLMs to understand the user input because the user input is just a string. But we need to convert this string into some structured format.

And then every piece in this structure, we need to normalize. So yes, the LLMs is used to understand, but not to inform. Gotcha, gotcha. That kind of sounds like at the top of the episode, you were talking about Bert and Roberta for encoding natural language into some other kind of representation. And so it's an interesting kind of callback to there. Before we wrap up this episode, I feel like we've got to talk about DeepSeq. Yeah, yeah. Yeah.

It's what everyone's talking about these days. And so, uh,

You wrote to me in an email that DeepSeq crushed OpenAI and Anthropic. What do you mean by that? Well, I made a post, by the way, on this last week, so I refer everyone to read more in detail. So I think that DeepSeq is probably the most important thing that happened to language models since the release of ChatGPT.

And it's not in terms of, okay, this model beats that model. We already saw multiple examples when some new model beats a previous one, and then the company that created the previous model releases something new, and now again, it's state-of-the-art. So it's not in this sense. It's more in the sense that what DeepSeq did, they, okay, I will probably enumerate. So the first thing they did

They trained a state-of-the-art model by using a very tiny budget. So what previously was considered like needing maybe $20, $30, $50 million to train a new kind of a version of some model, now it takes probably $5 to $10 million. So it's about like a 500% decrease. So this is one.

But again, if they kept this only for themselves, well, everyone would say, okay, well, they're lucky they don't spend a lot on their models, but so what? Others have money, so nothing changed. But what they did, they actually have shown everyone how to do the same thing, like,

as a recipe, as a step-by-step. So now not only you know that a very competitive model can be created with a small budget, now you can create one for yourself. So they published this public technical report and already online you can find several independent implementation of the R1.0 and R1.0 training. So anyone can do it. This is two.

The third is that not only they're cheaper to train, they're also much cheaper to run. So if you compare the pricing of, let's say, OpenAI for O1, it's about $60 for a million output tokens.

60. So like, it's probably, I don't know, you will spend five minutes talking to it with sufficiently long context, for example, talking about some book or some article, and you will pay $60 like in five minutes. And what DeepSeq shows is that their model, they charge $2 for a million output tokens. So again, it's like,

30 times reduction in cost. So not just anyone can train it now, but anyone can run it and it will be very efficient. So if you take all of this, they kind of gave the state-of-the-art AI to anyone, to your brother, to your grandmother. So they can just

make it and have it and this is what was considered opening the eyes or entropic secret sauce and

And the final thing, not just because when you create language models, it's not just about compute and it's not just about knowing how. It's also about having the right data. And the data was always what we call the mode. Companies like OpenAI and Anthropic, they invested a lot in creating high quality data for the model fine tuning. So because when you just pre-train it, it cannot talk.

It's just the next word predictor. And then you must convert it into a chatbot. So it takes questions and output answers. And then you also must convert it into a problem solver. So not just you ask questions and expect answers, but you also have to have some sort of multi-step interaction where you solve a problem, for example, a coding problem jointly with the LLM. So to make LLMs act like this, you need examples of this problem solving

conversations. And those examples should not be, you know, some random stuff. They should really be to the point. Okay. Okay. Let's solve this specific problem. So to create such examples, you need subject matter experts. So having such subject matter experts available to create hundreds, even thousands, even hundreds of thousands, such examples, it's a huge investment. So I think that they invested billions just in, in getting those conversations.

Now, DeepSeq, they have shown that you don't need that. So basically, their approach to training R1 is based on automated validation of solutions. For example, let's say you ask it to generate some code.

It generates this code and instead of, like previously, you would ask a subject matter expert to look in the code and say, "Yeah, it makes sense," or "No, I don't like it. It's too long." They run the code and once the code is executed, they take the output

and then compare it to the ground truth. So what was supposed to be the output or just it compiled, that's it. It's a signal like for reinforcement learning, it says one it's compiled, zero it crashed. The same format. So they have a math problem. They know the solution. The solution is 42. Okay. And they ask the LLM that they train to generate a bunch of solutions.

And if one of them gives 42, they say 1. For the rest of them, it's 0. So for the logic, it's the same thing. You can create a kind of a logical derivation, like you have this hypothesis, and then you try to derive whether this hypothesis, this conclusion is true. You can verify the logic, so you can create this task automatically. Again, your LLM tries to solve this logic problem.

And in the output is the killer is the cook. So you see whether the output is cook and you say one, the reward is one because it's cook. If it's something else, reward is zero. So creating such examples are very simple. For example, you take a GitHub repo

You hide just one class from it and you ask the LLM to fill this class. So write it from scratch. And then you compile the full report. If it compiles, cool, the reward is one. If it doesn't, reward is zero. So you don't need subject matter experts anywhere in this pipeline. So they created like hundreds of thousands, close to a million such examples fully automatically. Again,

like it removes this mode that was previously only available to companies with big budgets. Now again, you can just recreate this training set at home and create R1 named after you. So they kind of removed any competitive advantage the biggest players had. And they had this advantage

for two years. And even I remember a year ago, the OpenAI folks before everyone left, they gave an interview and some journalist asked them like, okay, you see there is a huge movement of open source LLMs. Are you worried about that they can even probably undermine your business models? And they laughed at like, huh, no, with someone saying,

with someone working at home with a tiny budget, there is so much they can do. They don't have data, they don't have compute. And now everything of what they mentioned is gone. So now we are kind of at square one. What's next? So this is the biggest revolution, I would say, that R1 did. The model itself is good, but it's not about the model. It's about

The fundamental change in how OpenLLM is, like the notion of OpenLLM changed and what it can do now.

Very well said. Earlier you mentioned open-weight LLMs, so LAMA models from Meta, for example, are open-weight. They're not open-source because you can't see the source code. Would you say that these models from DeepSeek are actually open-source? Well, my personal opinion on this is much stricter than some others. For example, Jan LeCun, he calls LAMA open-source models. In my definition, because LLMs

What does open source mean? Not in terms of a formal definition, but in terms of practical aspects. Open source in software means that anyone can reproduce your software

independently. So you put the source code online, anyone can download it, tweak it, kind of adapt a little bit to their system, run and get the same software as yours. But with LLMs, it's not like this. So you cannot take just the model itself

and run it locally and say that you reproduced it. No. The module itself is similar to the binary. For example, you download, I don't know, GIMP, open source graphical editor. So if you just download the binary, it's not open source. Or you download Adobe Acrobat, or you download Word.

You can run it on your machine, but you don't call it open source. You can use it. You can access to its binaries, but you cannot tweak it. You cannot update it and make it different. And this is what open source is. So with open models, if you want to reproduce this model

At home, you need not just the code that was used, but you also need the data that was used because the model is nothing without the data.

So from this perspective, these open weight models, they are open weight, but they are not open source. And there is, I mentioned in the beginning, there is a bunch, maybe a couple of models that come not just as open weights, but also you can download the full training data set that was used to train it. Unfortunately,

It hasn't become a standard practice of releasing new models. So these models with open dataset, they are no longer competitive today. So if you want, you can take the dataset and you can train a new model based on it. But again, by today's standards, a typical dataset for pre-training today, it's between

probably 15 and 20 trillion tokens so again if you download some openly available data set uh

uh it will be maybe four or five trillion tokens so you cannot really uh hope to reach state of the art when you have four times uh less data nicely said um before i let you go we did have some audience questions uh so we actually got a very long one here uh so i i posted a week before we recorded that you'd be coming on the show like i do with some upcoming guests

And Dmitry Petukhov, who is a fraud prevention data scientist in Moscow,

He said that he wasn't aware of you before. And so he's grateful to us, to the podcast, for bringing another interesting personality with new additions into his book queue. So you can expect a few purchases there from Dimitri. And so then he says a related question came to mind for him. He says, these days, there's a lot of talk about disruptive content.

and transformative effects of language models and generative AI on society and on technology in particular. And so for me, in the conversation that we've had today, Andre, this seems to relate to even the kind of thing you were describing, how previously at Talent Neuron, you were only concerned with the post-download, but now you're able to apply LLMs into pre-download aspects as well.

Anyway, so then his question is, it would be interesting for me to hear Andre's thoughts about what effects these transformations have already had and will continue to have on the quote-unquote traditional machine learning project lifecycle. So he describes that traditional cycle as data gathering, quality checking, model developing, validating, deploying, monitoring, and then celebrating the results. And so how have LLMs, generative AI, disrupted

that traditional machine learning project lifecycle and how might it continue to? Well, I can tell... Dmitry was the name, right? Dmitry, yeah. Yeah, Dmitry. Okay, so I can tell Dmitry that for maybe this year, maybe part of next year, people will still...

pretend that LLMs out of the box work sufficiently well and we don't need to follow a traditional machine learning process where you gather the data, you select an architecture, you train, you tweak hyperparameters, you test, and you go back if you see that your initial approach wasn't good.

For some more time, people will still follow the hype and say, okay, we don't need to train anymore. But again, my team, I have, we are four people. We are all hands-on. My initial position was, okay,

Because LLMs can do so much stuff out of the box, we should change the way we work on projects. So we should transition from traditional training-based approach to prompt engineering and probably what they call few-shot learning or few-shot prompting, like when you add examples directly in the prompt and this kind of tweak the model's performance.

And what we concluded is that this approach has its benefits only in the beginning. For example, you want to build a complex system. For example, the system that I explained where we have some AI going to a corporate website and it must figure out how to create a robot for gathering data.

the data from it. So in this complex system, you need multiple places where you would put machine learning. For example, you need to detect what link to click so that you reach the career section.

You need some classifier that would say you are where you are supposed to be and not somewhere else. You need a model that can say, okay, I see the job title. I see a location. I see X, Y, Z. So for all this, you need models. So imagine in the past, like, for example, five years ago,

You start a project like this. Every single piece where you need some kind of AI-based decision, you would have to gather the data,

implement the full process of developing a model just to put it in one place. And now you have, for example, five, 10, 15 places where you need to make such decision. So before you can deliver, not deliver, but conceive a prototype model

you would have to solve like 15 machine learning problems from scratch. It's crazy. It might take years for larger teams. For example, you can scale horizontally by adding people. It scales. We are four. Like we cannot, you know, clone everything.

more people to train all those models in parallel. So for us, it would took years to develop. Now, thanks to LLMs, we can replace all those places where we need decisions with an LLM that we just instructed. Like you should predict whether this is a job title or not.

And this makes the creating of a minimum viable product, if you want, or some kind of production-like prototype very fast.

But then when you really want to go in production, you will not tolerate, you know, 30% error in this place, 25% error here, 40% error here, because the error has the property it accumulates. So if you make 15% error prediction here and then 15 here and then 15 here, so it makes like the chance to, you know, to reach your destination, like tending to zero very fast.

So this is where LLMs are cool for fast prototyping. You don't need to train your model. You can just instruct an LLM. But then when you actually want to go in production, you will have to investigate for all of your placeholder LLMs, which one of them are the weakest LLMs.

weakest, how do you call it, piece in the chain. And you will have to replace this weakest piece in the chain with a real classifier that you will actually control. You will actually be sure that you can reach 95% accuracy, 99% accuracy if needed.

And for this, you will create a model from scratch. You don't have to use LLMs all the time, or you can fine tune LLMs, but in a real machine learning sense. So you gather a data set, you actually execute learning iterations, and you see how well the model becomes on the

holdout set like a validation set and once you are satisfied you say okay cool this piece now works as intended so you can run your system already in production but it will be kind of a

working prototype and or a alpha alpha beta version whatever you want to call it and then in the future you will replace those critical pieces with actual uh actual machine learning models so lms are cool for fast prototyping they are cool for uh you know uh

interactive problem-solving like okay what if what if I try this what if I try that but then when you go to production you would really want to follow a rigorous machine learning process

Nicely said. Very well summarized there on a critical topic for a lot of our listeners who are themselves developing machine learning pipelines. And now, like you say, having those 15 models that a team of four would have to develop, it was previously intractable. And now you can come up with the right prompt

And poof, you have some level of accuracy that in some cases is acceptable out of the box. - It's cheap. Like you spend nothing and you have something. This is already better than zero, right? - Yeah, exactly.

All right, Andre, this has been an amazing episode. I've really enjoyed learning from you today. You're really brilliant. It's been awesome. Before I let you go, I always ask my guests for a book recommendation. Well, I should tell that I'm more a writer than a reader. I was really like a fanatical reader when I was a teenager. And my dad has a huge collection of books.

science fiction and historical books. So I read a lot, but since I moved, I don't have myself a library except probably the one that stores counterfeit copies of my books. I keep all of them. I think that the biggest impact on me was by The Little Prince and

I even put a quote from The Little Prince into my new book where the prince says to a fox that

the language is a source of misunderstandings. And I really found it really like to the point for the book, because yes, you build those language models, but they can create more problems than solutions. And not just because of this, but I think the little prince for me, it's a reminder that an adult can still remain a child.

in their heart. And for me specifically, it resonates because I still feel like if I was 22, 25 years old, despite that I already have 43. So I had my kids grown up, but when I read The Little Prince, not sometimes, but every time it makes me want to cry because I really feel this, you know,

atmosphere of an adult, of a child stuck in an adult's life.

Yeah, the book is also a hugely influential one for me. I try to use it more and more recently in guiding my professional decisions to be having more of a sense of play in my life and be asking things more like, you know, what's my favorite color? More so than, you know, how much revenue will this bring in? Right.

But yeah, so in addition to The Little Prince, amazing suggestion there. You also, of course, I'll just kind of recap. So my mistake for describing it as a trilogy. So we have in a duology, I don't know what the equivalent is for two. A duology with a spinoff. And it's always when I talk to my kids,

And they say, what's about this book? I say, the second one? No, that is the third one. The third one, if you count books, but the second one in the series. Yeah, so it's a duology. Yeah, so in your 100-page book series, you have the original, the 100-page machine learning book that is iconic, and now your brand new 100-page language models book as well.

And yeah, if people are interested, the spinoff that is not part of the trilogy, but is your third book, is the Machine Learning Engineering book that people can dig into to learn how to use machine learning to solve business problems at scale. So yeah, amazing books that you provided. Thank you also for the amazing perspectives you provided on a broad range of

of timely topics on today's podcast, Andrey. And hopefully when your next book comes out. I'm already thinking, but I should make a pause because it's very exhausting to write books. And especially when you challenge yourself and the book should be small, but not superficial. So it's exhausting. I spent nine months writing this one. So I think I will take probably a break for a couple of months before I...

Yeah, definitely take a break. But I've just, you know, if in a few years you have another one done, we'd be delighted to have you on the show again to discuss that. You have an open invite. So Andre, what's the best way for people to follow you after this episode?

Oh, it's not hard to find me. So you can Google my name, Andrey Burkov, on Google and you will find links to my LinkedIn profile and my Twitter profile. So on LinkedIn, I'm more, how to say, professional. So I try to filter what I post. On Twitter, I'm more like myself because Twitter is mostly anonymous. So

I can share some stuff without being linked to my employer. It was especially hard with Gartner because Gartner had a strict online presence. So I had to limit myself very much in what I post. But now, because we are not Gartner anymore, so I'm much more open even on LinkedIn. But if you really read my unfiltered conscious flow, join my Twitter.

Yeah, Andre, thanks so much. And hopefully we catch up with you again in the future. Thank you, John. It was a pleasure to be with you. And thanks for the questions. What an excellent episode with Dr. Andre Berkoff. In it, he covered how AI agents face fundamental challenges. They can't be effectively debugged when working together. They lack true agency and they struggle with out-of-distribution tasks that weren't part of their training data.

He also talked about how LLMs are excellent for rapid prototyping of ML systems, but production-grade applications still require traditional ML development processes for critical components. He filled this in on how he achieved zero hallucinations in a talent neuron chatbot by using LLMs only for understanding user input, while relying on structured data and predefined templates for responses.

He talked about how DeepSeq revolutionized the field by reducing model training costs by 500%, making their methods public, cutting inference costs by 30x, and eliminating the need for human experts in training data creation.

He also talked about how the key distinction between humans and AI is our ability to plan infinitely into the future and accurately assess what we do and don't know. As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Andre's social media profiles, as well as my own at superdatascience.com slash 867.

And yeah, I've been saying this for a few weeks now, but it's now coming up in just two weeks. I will be speaking at the RVA Tech Data plus AI Summit in Richmond, Virginia. It'd be awesome to connect with you there in real life. There's a ton of great speakers. And so this would be a great conference to show you out, especially if you're in the Richmond area. It'd be awesome to meet you there.

Thanks, of course, to everyone on the Super Data Science podcast team, our podcast manager, Sonia Braevich, media editor, Mario Pombo, partnerships manager, Natalie Zheisky, researcher, Serge Massis, writer, Dr. Zahra Karchei, and can't forget our founder, Kirill Aramenko. Thanks to all of them for producing another tremendous episode for us today. For enabling that super team to create this free podcast for you, we are deeply grateful to our sponsors. You can support this show by checking out our sponsors' links, which you can find in the show notes.

And if you yourself are interested in sponsoring an episode, you can get the details on how to do that by heading to johnkrone.com slash podcast. Otherwise, share, review, subscribe, edit our videos into shorts if you'd like to. But most importantly, just keep on tuning in. I'm so grateful to have you listening and hope I can continue to make episodes you love for years and years to come. Until next time, keep on rocking it out there. And I'm looking forward to enjoying another round of the Super Data Science Podcast.

867: LLMs and Agents Are Overhyped, with Dr. Andriy Burkov 01:33:12 Share

Super Data Science: ML & AI Podcast with Jon Krohn

Deep Dive

Shownotes Transcript

867: LLMs and Agents Are Overhyped, with Dr. Andriy Burkov