We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

GenAI hot takes and bad use cases

2025/2/24

Practical AI: Machine Learning, Data Science, LLM

AI Deep Dive AI Chapters Transcript

People

Chris Benson

Daniel Whitenack

Topics

Daniel Whitenack: 我认为目前不应构建完全自主的生成式AI代理。虽然在设想中，让AI代理处理销售流程的各个步骤（例如，寻找潜在客户、收集信息、建立联系、演示产品、达成交易）听起来很诱人，但实际上，由于这些系统存在诸多错误和脆弱性，因此效果不佳。目前，最好将AI用作辅助工具，帮助销售人员完成特定任务，而不是完全取代他们。此外，我不建议使用生成式AI进行时间序列预测或任何形式的预测。这些模型缺乏对现实世界的理解，并且在处理数字方面表现不佳。虽然它们可能在简单的文本分类任务（例如垃圾邮件检测）中表现良好，但在更复杂的任务中，例如预测未来的股票价格，它们的表现往往令人失望。即使是使用视觉模型处理时间序列图表，也可能无法准确预测未来的趋势。最后，我不建议使用生成式AI进行完整的代码重写或软件应用程序的开发。虽然一些工具（例如GitHub Copilot）可以作为代码辅助工具，但它们无法完全取代程序员。目前，生成式AI更适合处理小型任务，而不是构建大型复杂的应用程序。 Chris Benson: 我同意Daniel的观点。完全自主的代理在当前技术水平下存在风险，尤其是在涉及敏感信息或需要周全考虑的领域。Netflix的新剧《卡珊德拉》就展现了这种风险。此外，我不建议将生成式AI用于高风险的金融交易或需要实时处理和关键结果的应用。虽然生成式AI可以作为辅助工具，但它不应完全依赖于此类应用。在制造业中，生成式AI可以用于分析质量评估数据，但它不应用于实时质量评估。最后，生成式AI不适合用于世界主要语言之外的任何语言或具有文化多样性的内容。这些模型主要针对少数几种语言进行了训练，并且存在文化偏见。即使是简单的工具，例如支持从右到左书写的UI，也可能不支持所有语言。

Deep Dive

Chapters

This chapter explores the limitations of fully autonomous AI agents. The discussion highlights the current unreliability of these agents for complex tasks and the potential for errors, especially in sensitive areas. It emphasizes the need for human oversight and the benefits of using AI as a tool to assist human professionals rather than replacing them entirely.

Fully autonomous AI agents are unreliable for complex tasks.
AI should be used as a tool to assist humans, not replace them.
Current AI lacks the real-world understanding for high-stakes decision-making.

Shownotes Transcript

Translations:

中文

Welcome to Practical AI, the podcast that makes artificial intelligence practical, productive, and accessible to all. If you like this show, you will love The Change Log. It's news on Mondays, deep technical interviews on Wednesdays, and on Fridays, an awesome talk show for your weekend enjoyment. Find us by searching for The Change Log wherever you get your podcasts.

Welcome to another fully connected episode of the Practical AI Podcast.

In these episodes where it's just Chris and I, no guests, we try to keep you updated with some of the things happening in the AI world, talk through some things that might help you level up your machine learning and AI game. So excited to dig in with you today, Chris. I'm joined as always by my co-host, Chris Benson, who is a principal AI research engineer at Lockheed Martin. And I'm Daniel Whitenack, CEO of PredictionGuard. How are you doing, Chris? I'm doing great.

I'm doing good. I'm looking forward to our conversation today. It's a snowy day in Georgia and we can talk a little generative AI and talk about you wouldn't want to use it unless it was snowing in Georgia, that kind of thing. In the theme of coldness on today, which is also cold where I'm at, talk about the cold side of Gen AI or actually what we had talked about thinking through were

the bad use cases for Gen AI or where you shouldn't use Gen AI. Five or more bad use cases. Yeah. And, you know, the funny thing about it is this is a topic that we have casually talked about a whole bunch of times. And we had not previously said, let's make it an episode. But, you know, one of our, I think it may be a little bit of a pet peeve for not only us, but other people I talk to in the AI space is

There are so many, you know, we're at this huge hype within Gen AI and people just want to use it for everything that there could possibly be an AI application for. And, you know, there's so many places where it doesn't necessarily produce the best outcome for you. And we talk about this casually all the time. So glad that we're actually doing this in the show today. Yeah, I was creating some...

Some docs for a customer of ours and some training materials. And I have this section just labeled, here be dragons. So yeah, there might be some hot takes in here. I'm interested to hear what your takes are. My first one, so number one, bad use of Gen AI, or maybe one that you want to avoid, at least for now, is

It's maybe a hot take, but I would say from my perspective, completely autonomous agents of any type are currently, you know, well, who knows how long this will be the case, but currently and for some time.

generally a source of sadness for people when they try to create them. So what I mean by autonomous agent would be an agent or an automation that has no human in the loop, just sort of is running in the background and you kind of hope that it

does something for you. So it could be on the sales side, right? Oh, I'm going to have an agent do my whole sales process for me. And I'm just going to kind of sit back and work on my product. And the agent's going to make all of the sales for me. Or maybe it's some

some sort of internal admin process that you're automating or, you know, even all the way, you know, into manufacturing with automation in plants or, you know, more industrial case, whatever you're thinking of. My first one is,

autonomous agents. What's your thought, Chris? Not only do I think that's right, I'm smiling in a big way because I'm going to throw in something from the side just to support that. Apparently, there is a new show on Netflix and I just read about it last night in a news blog. Netflix AI is tough for me.

And, and, and it's, the show is called Cassandra and it's about this. It's like a home assistant robot with, you know, with agency in terms of doing lots of tasks, but it goes, apparently I have not seen the show yet because I just heard about it, but apparently it gets very, very dark. And I'm just like, when you were talking about that just now, you know, in, in,

more of a real world scenario, obviously. It made me think of that. And so, yeah, I agree. A completely autonomous agent in this day and age with no guardrails around it and you're just saying, go at it, generative AI, especially if it's dealing with anything that has any sort of...

sensitivity or requires a little bit of thoughtfulness to it. Yeah, not going there. Yeah. Well, and I think even beyond the kind of security privacy related things, a lot of times I just see people trying to do this and it just doesn't really work that well.

Early days, early days. Yeah, it's early days. So like when you have, and for those that, you know, maybe have or haven't listened to previous episodes, when we're talking about an agent, we mean, you know, you give a task,

to some sort of system, it has the ability then to generate queries maybe into other systems like APIs or databases or data stores or other things to accomplish a certain task. And it kind of loops over that task until it reaches an objective, right? And in the autonomous, fully kind of autonomous case,

You would have, you know, just using the sales example, because it's easy, you know, you want an agent to decide how to find prospects for you on LinkedIn. And then you want to gather, you know, a dossier about all of those prospects.

And then you want to initiate the contact and then you want to pull off some type of demo or call. And then you want to, you know, close the deal and do the contract arrangement. Right. And just sort of like determine how to do every step of that process basically relates

replacing a human in their agency with the autonomous agent. Now, I think in that case, we could say certain portions of that can be very interestingly addressed with AI functionality. So doing the prospecting, generating the dossiers, right? Those are, I would consider those good use cases if they're tied to a

maybe a sales professional that's deciding how and when to do those things.

in the imagination, it would be great to think of just kind of letting that run in the background and you getting sales all the time, but it just doesn't really work very well. There's a lot of fragility in that type of system when there's a lot of that determination of objectives and determining how to interact with systems and all of these things that produces a lot of errors, a lot of fragility. It's much, much more productive, at least currently for you to have, uh,

tool that can help your sales professionals prospect or a tool that can help them create these, you know, dossiers and that sort of thing. And certainly tie in AI to that, but not kind of this end to end complexity.

completely autonomous automation. I totally agree with you. And I certainly, by the way, just as a clarification from what I said earlier, I was not meaning to imply agents would typically have a robotic body. Just should I have confused anybody? There's a lot of people exploring that. There are. There are. You know, just one of the things to note in terms of, you know, we're in this, the rise of agents right now, it's the hottest thing out there. But there are

you know, it's interesting. There are a lot of guardrail mechanisms that are out there. I know in the industry I work in and defense, there are, especially in things like, you know, weapon systems and stuff like that. The DOD has guardrails around such things. So if you're listening and aren't familiar with that, but are a little bit worried about the world, it's fortunately there are people thinking along these lines. Yeah. And, and there are,

I would say useful agents at this point, just not kind of in that fully autonomous kind of setting. So AI systems that can connect to multiple things and maybe are used, triggered by a human to do certain things. Those are the most successful that I've seen. Absolutely. Number two from me, Chris. So we've got autonomous agents. Number two for me was time series forecasting or really any sort of

prediction mechanism. So whether that's predicting future stock prices or reasoning over series of data, making predictions, there's some level of prediction that these models can do somewhat well in terms of maybe it's things like general text classification, right? Is this

is this message spam or not spam? And you can give some examples and you could get some reasonable output from a model like that. That's why I kind of honed in on time series forecasting specifically, because at least as far as I know, and I know that there's research in this area, kind of using transformer models for time series forecasting. But when I think of Gen AI, I think of, I'm going to log into chat GPT, or I'm going to use deep seek or one of these models and

And, you know, if you paste in a bunch of time series data and try to create a forecast just with the Gen AI model and nothing else, then

then I think that's going to end again in sadness for you. It's not going to work so well. Yeah, I think so. I actually had that on my list too in the form of high-stakes financial trading. High-stakes financial trading. Where do you want to put your million dollars today and see where it goes? So maybe explore some of the possibilities there, but I don't think I would leave it to an agent today

to forecast or, or make that prediction on its own. Yeah. I think people have shown basically that these models definitely don't have the kind of world understanding real world grounding to make certain reasoning or take certain steps and reasoning to make reasonable predictions, but also they're really bad, generally really bad with numbers. And so, um,

you may be able to, even with a vision model, paste in a graph of a time series, right? And say, what month was my highest sales, if it's a graph of sales, right? And a vision model could reasonably return that value to you, right? But then if you say, well, now model out my sales for the next four quarters or something like that, I think generally that's not going to work so well. I guess you could argue that

a model could generate code that might use packages, you know, forecasting packages to actually make a reasonable forecast over certain data, then, you know, my general question then would be, well, that might be useful to generate your code to do it. But really, it's not Gen AI that's doing that. It's the stats models in Python or... That's right.

Or, you know, profit from meta and that sort of thing. Yeah, I mean, and just in case that confuses anyone, you know, there's the generative AI portion, you know, which can, you know, is trained on a general data set. And then there's these models that it might be generating code to access.

which are designed specifically for that function. So those are two different things. Yeah, the code that ends up being executed is not having anything to do with Gen AI, basically. Right.

Yeah. And maybe it would be worth highlighting in each of these cases that we talk about, Chris, some interesting tooling for some of these things. You know, in the autonomous agents case, certainly workflows and automations can be created and executed. You know, we had Prefect on the show, which is a workflow orchestrator that can be monitored and handle retries and all of that. That's a great thing if you're looking at kind of workflows and orchestration applications.

Time series forecasting, my go-to has usually been Facebook or Meta's profit package, which makes certain things pretty easy, but there's also many choices for that as well. So take a look through those things if you're interested in the non-gen AI side.

Well, friends, AI is transforming how we do business, but we need AI solutions that are not only ambitious, but practical and adaptable too. That's where Domo's AI and data products platform comes into play. It's built for the challenges of today's AI landscape.

With Domo, you and your team can channel AI and data into innovative uses that deliver measurable impact. While many companies focus on near applications or single model solutions, Domo's all-in-one platform is more robust with trustworthy AI results without having to overhaul your entire data infrastructure, secure AI agents that connect, prepare, and automate your workflows, and

helping you and your team to gain insights, receive alerts, and act with ease through guided apps tailored to your role and the flexibility to choose which AI models you want to use. So Domo goes beyond productivity. It's designed to transform your processes, helping you make smarter and faster decisions that drive real growth. And it's all powered by Domo's trust,

flexibility, and years of expertise in data and AI innovation. And of course, the best companies rely on Domo to make smarter decisions. See how Domo can unlock your data's full potential. Learn more at ai.domo.com. That's ai.domo.com. All right, Chris, on to number three. My third one was Domo.

do not use gen AI to do complete code rewrites or, uh, or the complete development of your applications, your software applications. Thoughts? Oh, I, I, I've tried that just playing around. Uh,

and I, I definitely don't think that that's ready for prime time, despite the fact that, you know, as, as we sit here and say this, there have been quite a few, uh, CEO luminaries out there, uh, who have been advocating that over the last year or so. And I, I, when I sit down and try to do that, uh, in, I get varying results, uh, and it depends largely on how mainstream a language is, for instance.

on how good it is, but I haven't gotten anything that I would say is a production grade program fully functional through nothing but generative AI. Just toy programs. Yeah, without interaction. Right. Yeah.

Yeah, I know this is advancing quickly. So who knows how dated this conversation will be in a few months. But I think we've been talking about this for some time now. And we've seen things like Devin and Cursor and these sorts of things come out, which are pretty amazing and do a lot of really interesting things.

but often don't kind of provide that full, like I'm going to prompt and get a software application out of it. There is, there's more to it than that. So I think sometimes people are maybe a bit disillusioned

And, you know, a better way to think about this or there are amazing kind of agents and toolings come out like the Devin cursor, all hands, when surf, et cetera, that can provide a huge acceleration in your code development. I think if you treat them like.

Code assistance and, you know, maybe even junior developers that you are pairing with. Right. So it's not so much that I'm just now a not complete non developer. Right. I have no technical skills and I just say I want this application.

and it is generated for me, that's really what I'm meaning when I say kind of complete app development. So Gen AI, from my perspective, is not capable of that right now, or you should not rely on it for that right now. There may be

Interesting demos and cases where some form of that is shown. But for the most part, I think thinking of the technology integrated into your code and programming as an assistant and even a highly functioning agent that you can pair with,

is a good model, just not the kind of, I guess it's a, maybe it's a specialization of the autonomous agent thing that I mentioned before. Sort of. I think, and I think you're making really good points in that it's, you can't just toss it over the wall and just say, here's an instruction, do it all.

and generate kind of a complex set of programs and stuff. You know, I have done tasking small things very successfully, but the scope of what they were addressing was constrained. And I think we are there for things like that and doing small bits. It's not uncommon for me to

you know, generate it's I, many years ago I would write a VBA code, a visual basic for applications for Microsoft stuff. I don't much anymore. And so now I'll, I can do something like that if I happen to be working for, for something in office to do something, you know, put, put something together at work. But when I'm actually coding up a large project, I've not been sick. I I've, it's very helpful to have different, um,

tools on this, but I've not found one yet that I was able to successfully do a significant coding effort by itself, just tossing it over the wall. So I agree with you completely. It will be interesting to see where we are a year from now, two years from now. Yeah, well, definitely. I would encourage people to check out things like windsurf and Devin and all hands and cursor and all of these things. Super cool. Try them out.

But don't expect that if you're not a programmer or have at least some minimal level of skill that you're going to create a huge application or project with all of its intricacies and have that work and scale well. Fair enough. All right, Chris, what are we on? Number four for me on the list of...

don't do this with Gen AI or bad Gen AI use cases for me is anything extremely high throughput, low latency. So of course, small models and very high throughput advances have taken place with Gen AI models, but still, you know, if you're doing quality assessment of products coming off of a

actual scaled up manufacturing line where you have to do maybe the assessment of each of those products in a fraction of a second. Really, you don't want to be reasoning over that data with the Gen AI model.

and take, you know, 10 seconds to generate your quality assessment for the product. It's just not, not feasible. Yeah, I would agree with that. And I actually have a subset that I'll throw in on that, that I think kind of fits in there, uh, which would be kind of like real time applications with critical outcomes. Yep. You know, that's a great way to phrase it. I think that that's, I think that that's, uh, uh, an area that you would, you know, uh, you may,

you may have generative AI as a component in that mix, but you're going to have to have some guardrails around it and you're going to have to have some specialized models to keep things on track because in a real time app where things matter on the tail end, you're great to use, but you don't want to rely entirely on that when it goes off the rails, you need some way to catch it. It doesn't take any time. Yeah.

And I think you make a couple of great points. Part of it is around the latency, which I kind of highlighted. These models just don't operate fast enough and they don't operate in the types of environments necessarily that you need them to operate in for these type of maybe edge use cases as well in many cases.

But also, these models perform or they do what they are supposed to do most of the time, right? But still, if you train a computer vision model, for example, to do that manufacturing task, that could run on CPU, extremely high throughput and have a much higher accuracy than any other

you know, generalized vision model out there, even that wouldn't need a GPU to, to run. Right. I agree with that. Yeah. So it's, it's just not, what is that? The separation between those two cases is still just really, really high in terms of those, those kinds of use cases merging. Now I, I do think that in a manufacturing scenario, right. Um,

There's a great, or any of these sort of other cases that you might think of high throughput critical type of scenarios. Gen AI is very useful, maybe just not for that high throughput, low latency piece, but certainly for staff at the manufacturing facility that want to look at and analyze the data coming off of the quality assessment system and ask questions about, hey, you know, I see this alert and

Pull this data for me to help me understand what's going on. Or are there any of these types of events that have happened in the past X time? And that query level side via natural language can be very powerful, for example. And there's many other things that you could do in those scenarios, but.

There is, I'll extend this just a little bit. As you know, my personal passion is in autonomous platforms, especially at massive scale, swarming, things like that. And when you talk about that, one of the areas where I think Gen AI does play is exactly the equivalent of what you just said on the manufacturing. And that's having a human in the loop or on the loop

that's able to interact. And so you're using Gen AI to actually be able to enhance the communication between the human who is in control or on the loop and able to step in and not, but not so much in the other areas, especially considering that when you have lots of vehicles, and this could apply for lots of different use cases, both in the commercial space and the military space,

where you have a lot of different platforms or vehicles in communication, which requires high throughput. But yeah, I think that the only space there that is a big one is in those interactions with the humans that are involved in that for safety. Yeah, for sure.

Well, I have one more, Chris. A last interesting bad use case for Gen AI. The one on my list was anything outside of the major languages of the world. So anything with any sort of linguistic diversity or cultural diversity. Essentially, the models of

the modern Gen AI era maybe work well in the kind of top five to 10 languages of the world, but there's 7,000 spoken languages in the world, which means they basically don't work for any of the languages of the world except for a couple. And moreover, the kind of cultural context of the models is driven by

mostly what has been gathered either from the internet or by Western tech companies, maybe, you know, Chinese tech companies, um,

But there's certainly a bias against kind of certain cultural contexts and languages. And, you know, even if you think about vision or video models, I'm sure the same is true, right? Because just certain things aren't represented there. So the reality is that it would be great if you could, you know, land anywhere in the world and

your chat GPT or whatever to help you interact in, you know, X country in Africa or Y country in Asia and have that work really well with whatever languages you might encounter. But I would say generally that's not the case as of now. I think so. I think, and I know you haven't mentioned it yourself, but longtime listeners who have been with us for years say,

will know that, that you used to be in that space in a former professional life, uh, and know quite a bit about, uh, about this topic that you've just brought up. So yeah, yeah, it's, I agree. It's, it's definitely, uh, I don't think that's changed substantially, uh, over the last few years. Yeah. And even simple things that don't have a lot to do with, I mean, it has to do with Gen AI, but also has to do with the tooling around it, right. In terms of

even other scripts in particular Arabic, you know, for example, which of course is a major language of the world, which to some degrees, you know, models can do reasonably well at at least some models. The tooling around the Gen AI ecosystem, right? Like, oh, I want to download this

chat, SDK, or this UI that I can plug in a custom model to. It's likely not going to support kind of right to left. Potentially, there's going to be some issues, you know, with the script and other things. So it's just kind of another highlight of this disparity that exists. And it exists, and I think is worth highlighting because mostly what we're talking about here is language models and

and really language models that support a very small amount of the languages on the planet. But that's what I had. Chris, any thoughts after going through the list of bad? I think I do have a few thoughts. I think one of the things that I've noticed there is that

There are kind of high risk and high and like where you have significant outcomes that can affect your

in a major way. And whether it be financial or manufacturing or my industry with defense or whatever, you don't want to put a general generative AI model in charge of doing things for which there are no guardrails. I think that that is a thing that I have noticed across a lot. And I could throw out a couple of other areas where I think that applies, like things like

high stakes legal advice. Do you have a great tooling within things like chat GPT and the other big language models for legal advice? Yeah, but would you really want to, you know, literally put your life savings at risk with things like that?

Maybe not. Maybe not today, at least. You see a lot of this. You see a lot of AI pervading medical diagnosis. And once again, I think there's a very good use for those, but probably not by itself, you know, in isolation. So any of these areas are.

where you have a substantial risk in the outcome in terms of good and bad, you probably want to have guardrails around it across many, many different industries. And that's, I think that's my takeaway. And, you know, I think that things are continuing to improve at a really, really rapid pace. And we've said things and had, you know, two months later, had the world change out from under us. And that may happen again here with some of these, but

Yeah, we're on a learning curve with these things and they're getting better, but they're not all the way there yet. Yeah, I think that's a great way to summarize, Chris. Thanks for chatting through the things with me and we'll look forward to carrying on the conversation very soon with you. Sounds good.

All right, that is our show for this week. If you haven't checked out our ChangeLog newsletter, head to changelog.com slash news. There you'll find 29 reasons, yes, 29 reasons why you should subscribe.

I'll tell you reason number 17, you might actually start looking forward to Mondays. Sounds like somebody's got a case of the Mondays. 28 more reasons are waiting for you at changelog.com slash news. Thanks again to our partners at Fly.io, to Breakmaster Cylinder for the beats, and to you for listening. That is all for now, but we'll talk to you again next time. ♪

GenAI hot takes and bad use cases 30:38 Share

Practical AI: Machine Learning, Data Science, LLM

Deep Dive

Shownotes Transcript

GenAI hot takes and bad use cases