We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode #417 Bugs hide from the light

#417 Bugs hide from the light

2025/1/21
logo of podcast Python Bytes

Python Bytes

AI Deep Dive AI Chapters Transcript
People
B
Brian
Python 开发者和播客主持人,专注于测试和软件开发教育。
M
Michael
帮助医生和高收入专业人士管理财务的金融教育者和播客主持人。
Topics
我开发了一个名为LLM Catcher的工具,它能够利用大型语言模型(LLM)来诊断Python应用程序和FastAPI应用程序中的异常。 LLM Catcher的主要功能是将程序运行时发生的异常信息传递给LLM,并让LLM解释异常原因以及如何解决问题。 你可以通过多种方式使用LLM Catcher: 1. 使用函数装饰器自动诊断异常。 2. 在try/except代码块中手动调用诊断函数。 3. 注册全局异常处理器,捕获未处理的异常。 LLM Catcher支持Ollama本地LLM和OpenAI云端模型,并提供同步和异步API。你可以通过环境变量或配置文件灵活配置LLM Catcher。 LLM Catcher可以帮助开发者快速理解和解决异常问题,提高开发效率。

Deep Dive

Chapters
LLM Catcher is a tool that uses LLMs (like Ollama or OpenAI) to diagnose errors in Python applications, especially helpful for complex or poorly documented code. It integrates with different error handling methods and offers both synchronous and asynchronous APIs.
  • Uses LLMs for exception diagnosis
  • Supports local (Ollama) and cloud-based (OpenAI) models
  • Offers various integration methods: decorators, try/except blocks, global exception handler
  • Provides both synchronous and asynchronous APIs

Shownotes Transcript

Translations:
中文

Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds and mine. And this is episode 417, recorded January 21st, 2025. And I'm Brian Ocken. And I am Michael Kennedy. And we're excited about this show today. And nothing ain't nothing going to bring us down. So...

But before we get started, I want to thank everybody that has supported us through TalkPython training, through PythonTest.com, the courses, through Buy My Book, our Patreon supporters, of course, you rock. And of course, many of the sponsors that have sponsored us in the past, and we love them too. But we also love people that support us directly.

If you'd like to send us topics, please do so through, there's a contact form on our website, but also you can send them to us at Blue Sky or on Mastodon, and those links are in the show notes. And if you are listening to this, thank you, and also share it with a friend. And if you'd like to join us live sometime, check out pythonbytes.fm slash live to see when the next episode's going to be released.

filmed and recorded. And you can join us and comment while we're going live. And thank you also for people that subscribe to the email newsletter. If you go to pythonbytes.fm, you could subscribe there as well and get the list of all the topics directly in your inbox so you don't have to go look those up.

Yeah, we're evolving the format of that a little bit, trying to make a little deeper analysis, but also skimmable. And yeah, it's a huge resource. I think it's great. Yeah. People listen as well, but it's also nice to just have that written down in one place. And we cover lots of great topics every week. And what is our first topic this week, Michael? The first topic will be the LLM catcher. The name, not terribly descriptive of what it actually does, but here's the deal. Uh,

I'm sure everyone has done this at this point. I know I've done it recently as I was yelling at the Bodo 3 API because ain't nothing as frustrating as a little bit of little Bodo, auto-generated, no comments, no documentation, no idea what parameters go in it. Anyway, you might take those errors and pass them over to an LLM and go, please, dear chat, co-pilot, anthropic, whatever, what is going on here? What am I missing, right? And

And it's super helpful. But this project is a little bit different. It's like a gateway to those types of things. So here's what you get. If there is a crash, obviously you have stack traces or tracebacks, depending on the language you're in and how you say it. They describe it here as the unsung villains of debugging.

Why wrestle with the wall of cryptic error messages when you could let LLM catcher do the heavy lifting? So here's the thing. You basically, I'll go down here somewhere. What you can do is in your try accept blocks, you can say, you know, given an exception, diagnoser.diagnose passing the exception, and it will pass those details over to various LLMs and say, help me understand this and print out a message that will show me how to fix it, not just trace back. Okay? Okay.

So I don't know if I can find any. I'm excited about this. Yeah. I think it's pretty dope. I would not use it in production, though you could. If you want your logs to have messages about here's actually what happened, it's your debugging sidekick. So what you do is you can run Ollama locally, and that's the default. Or if you give it your OpenAI API key, it can pass it over to whatever level of model you possibly have in

It'd be awesome if you could have a one mini or something like that, diagnose it over at ChatGPT. So there's different ones that we'll work with, but basically when it gets an exception, it says,

Hey, I'm working on this thing with fast API and I get this exception. Help me figure out what's going on. So the Olama one is a local free, just running your machine version, open AI. Well, we know all know about chat GPT, right? So you can put it as a decorator on a function. You can manually do it in a try accept block, or you can even register a global exception handler. So anytime a global exception happens, it's un-

caught in your system, it'll diagnose it. It has both async and async API, and you can set it up through environment variables. So it shows you how to pull down the QWEN 2.5 coder model for Ollama, which is pretty excellent. And just off it goes. Look at that. So you've got your diagnoser.catch on a risky function, or in your tracestep block, you just say diagnose or async diagnose.

Because it's going to run for a while. It's going to make an API call either locally or out to chat CPT. So you don't want to necessarily block your system. So you just make a little async await. Boom. Off it goes. Yeah. There you go. That's pretty much it. You can get formatted or unformatted information back. So if you need plain text to go in some kind of JSON field, you can do that. Or you can get it with proper formatting to make it more readable. What do you think, Brian? Yeah.

I'm going to withhold judgment on this until I give it a shot at some point. Yeah. You can even specify the temperature, aka the creativity you want the model to apply to your analysis. That's funny. Yeah. It's an open AI thing. I like it to, on any exception, just upload my entire code base and rewrite my code to fix the error. Exactly. Don't diagnose it, just fix it. Just fix it, man. Why am I even in the way here?

Yeah, so look at it. Yeah, you can even do the full-on O1 model of ChatGPT, which is like the really, really- Is that the $200 one? That's the $20 one, but you only get to call it 50 times a week. So not too many errors. If you get the $200 one, then you can call it all day long. Yeah, I'd like people to get the $200 one, put this in your CI and do it over all versions of Python so that we just fill up all of the-

Then we'll get an announcement of, "Oh, the entire West Coast is blacked out because we broke the power grid." Yeah. But anyway, I think it's interesting. Just plug that in. Yeah. It looks like it might be kind of fun. Yeah. It does look kind of fun. This was recommended by Pat. Thanks, Pat, for sending that in. Pat Decker. Oh, and Pat's here. Thanks, Pat. Yeah.

Well, I kind of want to talk about bad packages a little bit. Like no Christmas presents for them or what's going on? Yeah, no wrapping paper. No.

Actually, we are talking about wrapping. I want to talk about the Python Packaging Index and malicious stuff. Let's scroll down here. There was a security and safety engineer first year in review. This was from Mike Fiedler. He talked about a lot of stuff, but one of the things he talked about was quarantining. This came out in August, but I just am catching up.

So it's like if they catch COVID or what's going on. No, it's, it's, you know, it's like bad packages. So if somebody says, you know, there's a, there's, there's malware in a, in a package shouldn't be there. What do we do with it? And they used to like have the option to investigate it and then yank it, but it just sort of makes the whole thing go away. But the, there's a new process and they just recently at the end of December wrote about it. And there's, it's called project quarantine and,

And this, we're linking to an article that really talks about it. So if you're worried about malicious packages and you're curious about what PyPI is up to, go ahead and check this out. I'm not going to go through the whole thing. However, it is kind of interesting. So the idea is if we jump back down to like future improvements in automation, hopefully we'd have some sort of automated way. But like, let's say a couple people report that a package has malware in it.

administrators of PyPI can go ahead and somehow have some litmus test to say or something to say rather quickly, let's get this under control. And the quarantine doesn't delete the whole thing. It puts the, there's an API, simple API that an admin can go in and say, hey, we're going to quarantine this project. And the package goes into quarantine. And at that point, there's a bunch of,

the bunch of stuff happens. The, uh, it's not installable, but the owner can still see it. And the only owner can, can make, um, I don't know if they can make changes, but yeah, it's not modifiable while it's in quarantine, but they can see what's going on. Administrators can look at it, um, and, and determine whether or not there really is malware there and possibly it's

it's possible that, you know, we might have some bad actors reporting packages. So we don't want people to like report stuff that's fine and have things to remove just because they're angry about it or something, but that hasn't happened yet. So this, this, this has been in place for a little while. And looking at the statistics, it's been, let's see, since August, they put this in place. There's been 140 reported packages.

And they've been gone into quarantine and only one of them exited quarantine. And it's because why was it that the there was obfuscated code in there? Then that's a violation of the PI acceptable use policy. Project owner was contacted. They fixed it because they just, I guess, weren't aware that you can't do that. Really interesting. I didn't know that was a policy. Yeah.

Yeah. Well, I mean, it should be. What if you want to ship something that you, I know there are companies out there that would like, we would like to obfuscate our code, but we still want to make it available, but we don't do it through PyPI. I guess don't do it through PyPI. Okay. I don't want to obfuscate the code. I understand that that's primarily a shielded malware right behind, right? Yeah. They'll have a base 64 encoded string of something or other, and then they'll decode it and execute it and balance it. Yeah.

So, yeah, there's a...

Yeah. Created some outreach templates. So the full process, if you're confused and, or if you have a, this is something, if you get notified by an administrator that one of your packages isn't in quarantine, good. They'll probably point you to this anyway, but you know, check this out. I thought I'm glad that they're working on this and we're making the environment easier for pipey I admins to deal with, but also just safer for everybody to use. So that's good. Yeah. Excellent. Well,

Well, you know, I'm sure you're aware of this, Brian. Testing. Testing makes your code safer to use. Yeah. And I have fully embraced the async lifestyle these days. You know, I talked about rewriting Python and Cort, the async version of Flask, and I blogged about that and brought it up

on the show, I'm pretty sure. But how are you going to call APIs? I'm working on some projects right now that are like all about calling APIs. And I'm like, oh my gosh, so many APIs, this thing calls that, which, you know, so on. If you can do that asynchronously, that'd be awesome. And I would say probably the best kind of, I'm a fan of requests, but I want async story these days is HTTPX, which has got some basically very, very similar, not identical, but very, very similar behaviors and API patterns as

requests, but also has an async variant, you create the async client and then await all your calls, which is great. So you might want to test that right, even asynchronously, as you run code as async. So I want to introduce people to R ESP x, like response x, probably is the way you pronounce that I'm not acp x or response x, I don't know, whatever.

R-E-S-P-X. And what it does is it lets you mock out HTTPX requests. Super, super easy, however you like. So for example, if I want to make a call where I say HTTPX get, and I want to make sure that if that URL comes in, it's going to return some particular value like a 204, you just say resp.sql.

same function call with the values. And then you just say dot mock and you set the values or the behaviors that you want it to do. And off it goes. That's pretty cool. Yeah. And it also comes as a PyTest plugin if you want to roll that way. So then you just say rest mock dot whatever and just call the functions. And then all the examples here are like...

First line, mock it. Second line, call it. But probably you're testing some function that then internally is using HTTPS through like a sync with block. There's a lot of layers going down there that you might need to work with. And so that would be a more realistic example. You call the mock and then you call your code and then something happens. So that's pretty cool. You can even use mark. It makes sense of this PyTest.mark statement here for me. What are we doing? Yeah.

What do you mean? Okay, so you've got PyTest marking it with RESPX. So the project is defined a custom mark, and it's passing in the base URL of foo.bar. Yeah, you don't have to, I guess, say the base URL, right? Right, because you're just passing it in. Because it's really not that bad, not that hard to pass in through markers a variable to VAR.

So that's what's going on. So you kind of pre-pair it with your mark here. Okay. Awesome. Yeah. And then the fixture is passed in. Okay, cool. And that's pretty much it. There's not a whole lot of, not a lot to say about it. But if you need to mock out HTTPX, instead of using generic mock stuff, you can use this library that basically has exactly the same API as HTTPX. Pretty cool. Sometimes I forget that not everybody has completely internalized the entire content of my book, but...

Well, we can work on that. We can work on it. I learned something new. Oh, really? You know what? If you, I think if this is your next topic, I had no idea about this either. So I'm about to learn something new. Okay. Well, so this is actually something that Rodrigo also learned something new because he marked it as a TIL for today. I learned. And I kind of love people posting the TILs, but also I,

I'm personally somebody that I don't think you need to prefix things with TIL for today. I learned if you just have a small blog post, go ahead and post it. I like small posts. Anyway, so unpacking keyword args with or K-Kwargs. I usually just say keyword args. Do you say Kwargs? I'm K-W-Args. K-W-Args. Okay. But I know people say Kwargs, but I don't know. It sounds like I'm speaking Klingon or something. I don't do it.

Yeah. Um, it makes me think of deep space nine with cork, but, um, unpacking keyword args with custom objects. So let's say you've got, uh, so there's a couple of things unpacking and we're talking about the, uh, star or the double star or the splat splat or double splat, however you want to say it. Um, so let's say you've got a dictionary, um, uh, and you want to pass that, the contents of the dictionary as arguments to a function or something. Um,

That's how we often use it is doing a star star with a dictionary and it unpacks it into keyword arcs for a function call, which is cool. Or you can just do it. Here's an example of merging two dictionaries with this. I don't do it like I don't usually do this much, but cool.

Cool, you can do that. There's a newer syntax where we use the pipe on dictionaries as well, and that's the same thing. There's like three or four ways to do this these days. Yeah, because with Python, there should be one obvious way to do... And if there's not, there's four. Unless it's strings, then there's six. Okay. So there's a lot of times where doing this star-star unpacking is so cool and convenient. And

But if you have custom objects, not dictionaries, if you have your own objects, what do you want? How do you deal with that? Can you do that? Yes, you can. All in this is that's what this little TIL is about. All you have to do is you have to add a keys function to your object or your class. And the keys function needs to or method needs to return an interval. And in this case, just a list is an interval, for instance, and

And then the example, he's got a Harry Potter class. Um, it was returning first, middle and last. Um, and then a get item, um, that, uh, presumably that takes a key, um, and returns something. Um, and that's all you need. And then you can, uh, you can do this double splat thing and it works. Oh, that's awesome. And, uh, also the example is good also to just to remind everybody that, um, uh,

When you're doing the get item to go ahead and do an else clause with a key error. So if people pass in the wrong thing, they get the appropriate exception. So anyway, thanks. Yeah, I love it. Very, very cool.

All right. That's it, I guess. You feel pretty extra, I can see. I do feel pretty extra. I got more extras than I had normal things. So let's jump in. Let's do it. Over on pythontest.com, oh, a couple of things. I'll just kind of go backwards. First off, I finally fixed it. I had X up for Twitter, and I don't do Twitter anymore. So I replaced it with a blue sky icon. And also on my...

contact form has blue sky now. So I fixed those things. Also, I had like incorrect podcast thing stuff up. So I fixed my podcast data testing code and by the bytes and stuff. Of course. Anyway, that's not what I really want to talk about. What I want to talk about is the top high test plugins. I've been researching a lot of the stuff in here for for these the testing code season two. And I'm relying on

with this data, I'm relying on the top PyPI packages. And this is a excellent resource and it uses BigQuery. And there was just a new article from the person that created this, Hugo.

He wrote an article about what's going on with this. A surprising thing about PyPI is BigQuery data, and it's interesting and also kind of exciting news. So the interesting thing is he's using the free version of Google Big Cloud or BigQuery stuff.

whatever you need the google account um you get a few big query queries um and if you do it too much they kick you out um and so uh at first he started with 4 000 projects then he went bumped up to 5 000 project and then 8 000 projects but there's more than that so he's like well i wonder how much i can do and so this is a little test that he went through i'm going to jump down to the the

punchline. And the punchline is that you can do he went up to

tried a million packages and there aren't a million packages, but it, it returned 531,000 packages. And it was the same, the same bites processed as even just doing one for 30 days. So it doesn't matter. It turned, turned out it didn't really matter how many packages the tick query, what it was, was how the date time, the date spread. So if you did, if,

If he did like five days, it was way cheaper than 15 days, which is way cheaper than 30 days. And it's relatively linear. So it looks like what he's going to do is change it so that we get like a ton of package data, like as much as we can get, 531,000. But he's probably going to report that in small.

smaller chunks too, because a lot of people are like, or something. Yeah. But a lot of people aren't going to want to see 531,000 top, the top 8,000 is probably sufficient. I got to zoom in to see them all at once. So I I'm excited because when, when I'm using, I'm using the 8,000 dataset and the top high test packages, there are currently 133 in the top 8,000. And I'd like to have a bigger list. So yeah,

um, if I've got the top like 10,000 or 20,000, I could probably get a bigger list of packages. Anyway. So, uh, that's, that's it. It's just interesting thing. If you're doing BigQuery data, you can, uh, it's the date that is the big effector of the price, right? Because it probably counts the number of downloads for each day or per download individually. Whereas, you know, if there's only 500,000 packages, there's that, but

There's way more downloads than there are packages. - The other thing that, yeah, and the other thing that might change is, I think is gonna change is, it cost more to filter on just pip to packages, and now we're getting a lot of UV.

people using UV to download stuff from PyPI. And so he wants to include that too. So it'll probably, I think he's going to change it so that the data is from everything instead of just pip. That makes sense. Yeah, it definitely does. Yeah. Anyway. Awesome. Cool. Do you have any extras? I do. Not too many, but let's do it. So, Owen Lamont, remember we talked about UV-Secure, the project he created? Yeah.

It scans your lockbox. And I was speculating where, what API it was using. He wrote in to say, thanks for the shout outs. It just uses the PyPI JSON API at present to query for package vulnerabilities. Same thing that pip audit does. He does work at it asynchronously to try to make it a little faster, but it's just the simple API there. So that's what that is. Not something like sneak or some other more advanced threat modeling setup. And yeah, that's it. That's all I got for my extras. All right.

Cool. How about a joke? Oh, I've got a joke. People, if they like puns and stuff, this will be good. It's at angle bracket slash angle bracket code puns or codepuns.com. So we've all written bad code. And I know that sometimes testing will shake out the bugs, Brian. But do you know why programmers prefer dark mode? I think this is not totally wrong. I think we should switch it. I think it's a foul fallacy here. Why, I guess...

I'll read it as it is. Why do programmers prefer dark mode? Because light attracts bugs. I guess if you're talking moths, but if you're talking cockroaches, it's the other way around. But here's the thing. That's a great joke, but you can click more puns and they just keep going. My love for coding is like a recursive function. This is not very good one. That's fine. Why did the for loop stop running? It took a break, Simicol. How do you comfort a JavaScript bug? You console it. I see. There's a lot of good stuff here.

Because console.log is how you debug that thing instead of print. Oh, okay, okay. It's the print debugging equivalent of JavaScript. I'm not a JavaScript-er, okay. Well, you certainly can't console your JavaScript bugs when you create them. All right. Why do you not want to function as a customer? Because they return a lot of items. Come on. Anyway, people go to CodePuns.com and click through until they can't take it anymore. Oh, yeah. That's a good one.

That's probably why you want a C customer because they only return one item. That's true, right? Well, good stuff. Yeah. All right. Well, thanks again. Thanks for showing up for Python Bytes. And thanks, everybody, for listening. Yeah, you bet. Thanks for being here. Bye, everyone. Bye.