We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

#434 Most of OpenAI’s tech stack runs on Python

2025/6/2

Python Bytes

AI Deep Dive AI Chapters Transcript

People

Brian Ruckin

Michael Kennedy

Topics

Brian Ruckin: 我分享了一篇关于加速PyPI测试套件的文章，其中提到通过使用pytest-xdist并行执行测试、利用Python 3.12的sys.monitoring加速coverage、优化测试发现以及消除不必要的import可以显著提高测试速度。我发现pytest不仅适用于大型项目，而且通过一些优化技巧可以进一步提高其性能。例如，使用pytest-xdist时，需要注意数据库隔离问题，并可以使用pytest-sugar来改善输出。此外，通过-p:no参数可以禁用不必要的插件，减少import的开销。我计划撰写一篇关于加速pytest测试套件的文章或系列文章，分享更多实用技巧。

Deep Dive

Chapters

Trail of Bits significantly sped up PyPI's test suite using several techniques. Key improvements came from parallelizing tests with pytest-xdist, leveraging Python 3.12's sys.monitoring for faster coverage, optimizing test discovery, and eliminating unnecessary imports.

PyPI's test suite improved from 163 seconds to 30 seconds.
pytest-xdist enabled 67% time reduction by utilizing multiple cores.
Python 3.12's sys.monitoring and COVERAGE_CORE=sysmon resulted in a 53% time reduction.
Optimizing test discovery and eliminating unnecessary imports further enhanced performance.

Shownotes Transcript

Translations:

中文

Hello, and welcome to Python bytes where we deliver Python news and headlines directly to your earbuds. This is Episode 434 recorded June 2 2025. I'm Michael Kennedy, and I am Brian Ruckin. And I am super happy to say that this episode is brought to you by DigitalOcean. They've got

obviously a bunch of amazing servers, but some really cool gen AI features we want to tell you about as well. So we're going to be telling you about that later. The link with a $200 credit is in the show notes. So no spoiler there. If you would like to talk to us on various social things, tell us about tell us about what we might want to cover, give us feedback on what we have. If

links to our mastodon and blue sky accounts are at the top of the show notes as well. You can join us live right here right now on YouTube almost always Monday at 10 unless something stirs up the calendar and breaks that but we try to do Monday 10am pacific time. All the older episodes are there as well. And finally, if you want an artisanal handcrafted special email from Brian with extra information about what's going on in the show,

Well, sign up to our mailing list. And Brian, some people have been saying that they had been having trouble receiving emails. Like they signed up and they didn't get them. Yeah. Yeah. Well, that's because there's a bunch of jerks on the internet and they make it hard to have nice things like email that works. So it's like,

So many spam filters and other things that I've done some reworking. And I think some people who signed up probably will start getting email again. But basically their email providers had been saying, hey, you're sending from an IP address that has previously sent spam, blocked.

Well, we use SendGrid. SendGrid just round robins us through a whole bunch of different IP addresses. And if we happen to get one that previously got flagged, well, then you might get unsubscribed from an email list. How much fun is that? So I've done a bunch of coding behind the scenes to try to limit those effects and just send it again next time because it'll be a different IP address. Ah, jerks. So that is the spammers. Yeah, thanks.

And I guess with that, we're ready to kick it off. What you got? Well, I've been speeding up some test suites. So I'm interested in this blog post on trailofbits.blog. It's a Trail of Bits blog. And I think we've covered some stuff from them before, but anyway. Yeah, usually they're a security company. They do really interesting research into a lot of security things. Oh, really? Okay. Yeah.

Apparently, one of the things they've worked on is, or at least they're writing about, is making, yeah, it says TrailerBits collaborated with PyPI several years ago to add features and improve security defaults across the Python ecosystem. But they also, today we'll look at equally critical aspect

of holistic software security test suite performance. So there was some effort to speed up the test suite, and this is incredible. So one of the reasons why I'm covering this is to speed up test suites, but also I often get questions about is PyTest

robust enough to test a large system. And yes, it is. And I actually don't even think I, I mean, warehouse is a decent size. Apparently they've had the current or the test suite as of right. This writing is 3D.

4,700, um, tests count. The test count was 4,700. It's quite a few tests. Um, and so warehouse errors by PI, uh, there's a nice, nice graph on the, the blog post showing, uh, the time spent. So they went from 163 seconds. So that's what, uh, two minutes and, uh,

I don't know, like almost two and a half, three minutes. Yeah. Almost three minutes, almost three minutes down to 30 seconds. So this is a nice speed up. Um, and even as the test counts were going up, the time spent was going down. So how did they do this? Um, uh, the big chunk of the performance improvement was switching to PyTest X dist. So X dist is a plugin by the PyTest team, by the core team, or at least who's that's, who's maintaining it. And that is, um, um, uh,

67% of the reduction, what it does is it you allows you to run it on multiple cores. So like, I think the in this one, they had a 32 core machine, they can run it to use multi processing. Yeah, threading, right? Yeah, yeah, it's, it is multi process. Yeah, I think there's some configuration there that you can fiddle with. But mostly, it's multi processing.

So there is some overhead in doing that. So you wouldn't want to try to speed up a really fast already fast small

test it with test suite would go slower with X test. But anyway, this is a larger one. But there's some true I one of the things I like about this is because it's not a free lunch for X test because it's not always easy to split up your test suite like this one. So they were talking about paralyzing, paralyzing the test execution.

You can just say num process equals auto or dash n equals auto, and that just runs it on a bunch. It doesn't really work if you have things that are a shared resource like a database. So they also talked about setting up database fixtures such that each test

each test worker, the test worker gets its own isolated database. So they kind of show some of the code on how to do that. But this is open source code, so you can go check out the entire thing if you want. The other thing that you get with XDIST is very tertiary reporting. So they increase the readability by using PyTest Sugar. And I don't use PyTest Sugar a lot, but it's

it sure is popular and it gives you a little check marks. But one of the things it does is makes it, makes X dist, um, even more verbose, but it's, uh, it's kind of a nice green to green check marks. So have a nice, it feels good. It's better than the little dots. Uh, anyway, so that was a massive improvement, uh, with X dist, but that's, um, that's not all. Python 3.12, um, added the ability to, to, for, uh, coverage up high to run faster with, um, with a, uh,

by using the Sysmonitoring module and a NetBatch Elder implemented that a while ago. So they turned that on with a coverage core environmental variable and that sped things up quite a bit as well. Another 53% time reduction. Then a test discovery phase. This is an easy one. Didn't didn't increase time that much, but it's just, everybody should do this. It's one line config to say, where are my tests? So that's,

that's a good one. Uh, and then a, uh, a last one is a unnecessary import overhead. Is this kind of an interesting thing that I was like, how did they do this? Um, and through testing and they're using a thing called DD trace. And that is through, what is DD trace through data dog, um, library, but what, what it's, I don't know what it does really. I, uh,

but I just checked it out. I'm like, how are they? I looked at the pull request to see how they did it, and they're using a flag, dash P, that allows you to turn off plugins, either turn on or turn off plugins in your test suite. And DDTrace doesn't look like it's a PyTest plugin, but it does have one. So I took a look. DDTrace comes with a couple of PyTest plugins, so that makes sense that those are going to pull in

DD trace when those get the plugins get read. So anyway, interesting side side tangent right there. But really interesting read on how to speed up test suites. And this has reminded me that I really need I've got a whole bunch of tricks up my sleeve too. I'd like to I need to start a how to speed up your test suite post or series of posts. So yeah, that's super interesting. Good stuff. Yeah. Yeah.

Anyway, and I'll have actually my next topic, the next topic we talk about later in the episode will be around speeding up test speeds as well. Okay. Well, it's all about speed this week. Speed. Test speed. All right. This one is super interesting. And so I came across...

I don't even see this. I don't spend that much time on X. Not necessarily because I've got like some vendetta against X. Although you would know that from reading our reviews on... I think it was on Doc Bytheon. Somebody absolutely like went... Is having like a moment. Because I said, hey, Mastodon's cool. They're like, oh my gosh. Anyway. No, I don't spend that much time on there because I just find...

that like, I feel like the algorithm just hides my stuff and I don't get any really conversations or engagement. So that said, I ran across this thing that is super interesting from Pietro Charano. And it says, people aren't talking enough about...

how most of open ai aka chastgpt tech stack runs on python and there's this this um screenshot of a newsletter that talks about it okay so the tech stack this is super interesting python most of the products code is written in python frameworks fast api the python framework used for building apis quickly using standard python type hints and by dantic

talks about it. C for parts of the code that need to be highly optimized. The team uses lower level C for the programming language. And then something called temporal for asynchronous workflows. Temporal is a neat workflow solution that makes multi-step workflows reliable even when individual steps crash without much effort by developers. I actually don't know what temporal is. Maybe it's Python. It probably is. It sounds like it. Just to remind me, this is the, this is open AI's tech stack? Yes. Okay. Did

did some searching and I came up with the original article there. And it's, this comes from a conversation around building chat, TPTs, images, the image generation stuff, or you say, make me an infographic about what I've been talking about or whatever, which is incredible these days. So the article is entitled building, launching and scaling chat, TPT images. It's opening eyes, biggest launch yet with a hundred million new users generating 700 million images in the first week.

week. But how was it built? Let's talk about Python, right? So Python, FastAPI, C, and temporal. How cool is that? For people who are like, well, it's fun to use Python for these toy projects, but it's an unserious language for unserious people who don't build real things. Or is it? 100 million new users in a week. That's pretty epic. Well done, FastAPI. Well done,

new versions of Python, all these things. I got to know, what is temporal? Is this what it is? Probably, it's probably not even a, you know, a durable execution. This is written in Go, so apparently temporal is probably not. Anyway, isn't that interesting? It's always fun to have a data point. Yeah, I like that. I think there's a lot that we don't even know, we don't, that people don't talk about that are written in Python and FastAPI now. So it's a different world. Mm-hmm. To an audience is nice. DigitalOcean. DigitalOcean is awesome. Yeah, DigitalOcean powered

Python Bites for a very long time. I love DigitalOcean. I highly recommend them. But you got something specific to say, don't you? I do. This episode of Python Bites is brought to you by...

DigitalOcean. DigitalOcean is a comprehensive cloud infrastructure that's simple to spin up even for the most complex workloads. And it's a way better value than most cloud providers. At DigitalOcean, companies can save up to 30% off their cloud bill. DigitalOcean boasts 99.99% uptime SLAs and industry-leading pricing on bandwidth. It's built to be the cloud backbone of businesses small and large.

and with GPU powered virtual machines plus storage, databases, and networking capabilities all on one platform, AI developers can confidently create apps using that their users love. Devs have access to the complete set of infrastructure tools they need for both training and inference so they can build anything they dream up. DigitalOcean provides full-service cloud infrastructure that's simple to use, reliable no matter the

the use case, scalable for any size business and affordable at any budget. VMs start at just $4 a month and GPUs under $1 per hour. Easy to spin up infrastructure built to simplify even the most intense business demands. That's DigitalOcean. And if you use DO4bytes, you can get $200 in free credit to get started. Take a breath. DigitalOcean.

DigitalOcean is the cloud that's got you covered. Please use our link when checking out their offer. You'll find it in the podcast player show notes. It's a clickable chapter URL as you're hearing this segment, and it's at the top of the episode page at pythonbytes.fm. Thank you to DigitalOcean for supporting Python Bytes. Indeed. Thank you very much. All right. Let's see what we got next, Brian. Okay.

PyCon. Neither of us made it to PyCon this year, did we? That's too bad. But, you know, c'est la vie. Sometimes that's how it is. And I would venture that most of the people listening to this show didn't. Because if everyone listening to this show attended PyCon, it would sell out many times over. So that would mean most people here are very excited to know that they can now watch these talks. Most of them. There's something going on with 40 of them, but there's a

There's a bunch, there's what, 120 of the talks are online here. So I'm linking to the playlist for the PyCon videos, which is pretty cool. This came out a lot quicker than it did last time. Last time it was months until they published these, which was unfortunate. But, you know, this is like a week or something after the conference. So that was really good. Yeah.

That's incredible speed. Yeah. Yeah, yeah, yeah. And I pulled up something I want to highlight. It's too hard to navigate the playlist, so I'm just going to read them out that I like here. So I found the keynote by Cory Doctorow to be super interesting. It was basically how, like, going deep into his whole in poopification stuff that he's been talking about, which is a really, really interesting idea. A little hard to hear because of the mask, but, you know, it's okay.

still worth listening to. There's one, one talk entitled 503, 503 days working full-time on FOSS lessons learned, which sounds really interesting. There's a talk

There's going from notebooks to scalable systems with Catherine Nelson. And I just had her on TalkPython. So for all of these, I'm linking them in the show notes. And when I say, and on TalkPython, I linked over to that episode or that video or whatever as well, because her talk is not quite published yet. It's just recorded in advance. Unlearning SQL. Doesn't that sound interesting? Like most people are trying to learn SQL. Why would I unlearn it?

the most bizarre software bugs in history. It was interesting. The pie arrow revolution and pandas. Uh, I also did an episode with, with Reuven learner about that. What they didn't tell you about building a JIT compiler and see Python by Brant Booker also did a doc Python. I've said about that and linked to that one. This one's cool. Uh, from Henik design pressure, the invisible hand that shapes your code. He's got some really interesting architectural ideas. So super cool.

Marmo, the notebook that compiles Python for reproducibility and reusability. And I talked about that in an episode about that. GPU programming in pure Python. And I talked about that in an episode about that. And finally, Scaling the Mountain, a framework for tackling large tech debt.

large scale tech debt. That looks interesting. Don't all those talks sound super interesting? Yeah. Yeah. So I've linked all of them. I pulled them out. Y'all can check them out if you want. They're in the show notes. The most bizarre software bugs in history. Total clickbait, but I'm going to watch it this afternoon. I've got it. Exactly. I can't wait to watch it. Yeah, no, it's fun. All right. Over to you. Okay. This is an interesting header on this, but table of contents expand. Anyway, I

I just wanted to find some post to talk about this because it's a technique that I use for speeding up test suites. And it wasn't covered in the last post that we talked about. So optimizing Python import performance. So in the previous discussion, we talked about removing, using dash P in PyTest to remove plugins that might remove imports of things you don't need. But what if there's things that you do need

but not all the time. So one of the things I want to talk about is this test collection. So like the other one, they used Python dash X import time, and you use it by just like running, like you can run your app or you can run PyTest afterwards, and you can find out, it prints out a list of things

this has been in since Python three, seven, I think, but it, it prints out a list of all of the imports and how long it took to import them. And it's a little bit, a little bit hard to parse, um, because it's sort of a, like a, like a text-based tree level thing, but, um, but it's not bad. Uh, and, uh, looking at all of that, um,

you can try to find out which ones are slow. So one of the techniques is lazy imports. And this is a weird quirk about Python that I didn't learn until recently was that when you import something, normally we put the imports at the top of the file, but when you import a module, it imports everything that that module imports also. So if you don't, if the things that you're depending on are not really part of your logic,

if the user of it doesn't really need to import that stuff, like just for imports, you can hide that import within a function and it still acts globally. So like in this example, it says process data import pandas as PD. It's only imported when the function runs. But even after that function, that pandas is available in the module for everything else also. Kind of a weird quirk, but it works good. And I'm

I'm just going to tie this together to testing right away tests, test collection at test collection time, pytest imports everything. So if you don't, and you probably don't need to import any of your dependencies at collection time, so I hide a lot any I look for any expensive imports and move those into a into a fixture, usually a module level auto use fixture to get that get that import to only run when the test runs,

not when you're doing collection. So that's an important trick. Avoiding circular imports. So I just thought that, so hopefully you're already doing this already, but it says circular imports force Python to handle incomplete modules. They slow down execution and cause errors. Well, I knew they caused errors. I didn't know they caused errors.

I didn't understand the just slow down execution part. So there might be a way to, like there might be some legitimate circles, cycles, sort of, but get rid of them. That's weird. It does sometimes have to, you have to restructure your code. The third thing is keeping dunder and it's very light. And this is a tough one for, for, for pytest tests and stuff, because we, sometimes I have a tendency to, to shove things into dunder and it's,

And especially for importing everything, but keeping those Dunder and NetFiles as clean and fast as possible. So there's other things in this article as well, but those are the three that I really hit on to try to speed up test suites as cleaning up the import time. Yeah, that's really cool. And you might wonder like, well, what is slow? What is fast? How do you know?

Well, there are tools to import profile imports. We've talked about them before. I don't remember which one we covered, but there's one called import underscore profile. Cool, I was looking for that. Nice, thanks. Yeah, and so you can just say run my code dash import profile, and then you can just give it the things you're...

that you would be importing and it gives you a nice little timing of them. And probably a lot of them are super slow and you're wasting your energy to deal with trying to optimize that stuff. But some are fast. I mean, some are not fast and some use a lot of memory and different things like that. So this is actually pretty interesting to see how that all works, right? Yeah. And actually, so when I was doing some optimization for a big test suite recently, I

You got to measure because there were there were things that were fairly large packages that I just assumed they were going to be slow, but they had their their import system optimized already. So those those ones don't really some some packages that seem large, like pandas or something might be might actually be pretty quick. But it looks like it's in this example wasn't but like NumPy, it does a lot, but

but it's a pretty pretty it's a faster import so yeah interesting

It also depends, you know, how are you, you only import it once per process, right? It's like, it's not going to be imported over and over again. So if it's a hundred milliseconds, is that worth worrying about? Maybe, maybe not. And also with throwing things away, like, like let's say you've got, so one of the things that often in a, like a test require or development requirements file or something, or, or in your, your test requirements, there might be some,

So there's two sets of things that I look at often. There are PyTest plugins that are useful for developers to run locally, but they're not really needed in CI. So you can like take them out in CI. And then the reverse is like in CI, we often have, I often have like reporting system. Like I might export all the data to a database and have some plugins to handle that.

and that's not needed locally so turning those off locally so that you can have a little faster run so things like that so anyway speeding up testing is a good thing yeah absolutely it's all right extras you got extras i got yeah so i've got a couple extras um how about you let's go with yours first i got a few too okay so this is pretty quick um so this is from uh hugo peps and co a little bit of history about where pep came from um i've just been using it like since you

using the word. But apparently Barry Warsaw came up with the the acronym, and he calls it a back run him. He liked the sound of pep because it's peppy before he came up with enhance the Python enhancement proposal acronym. So that's an interesting thing. But that also takes a look at since then, there's been a lot of improvement proposals and enhancement proposal like acronyms all over the place in different different

different communities. So this is interesting, like Astro pie proposals for enhancement. And you totally know that they intentionally reverse those so they can make ape. That's great. Um, just a bunch of fun. Uh, the second one is, uh, python test.com. My blog has got a fresh coat of paint. It's got light, light and dark modes, but it's more colorful. Now it also makes it glaringly obvious that I don't blog as much as I'd like to. The fourth oldest post is from Janice.

January of 2024. Oops. Got to get on that. But one of the neat things I like about it is, and part of it is I didn't really like my theme, so I wasn't really blogging much. So I think I like it better. Hopefully I will. Is it still running on Hugo or what's it running on? Yeah, it's Hugo with, and this, I don't even, it's a JavaScript based search thing. And it's pretty zippy. This is all just like, I don't know.

I like that. Yeah, very cool. One of the, what was it? I was going to, there was one change that I hopefully maybe might change is the light mode for code highlighting looks fine, but it's a little hard to read with dark. I don't know, that red on black. Oh, yeah, yeah, yeah. Okay. You could probably change that with a little CSS action. Probably. Yeah. Anyway, those are my extras. How about you? Oh, yeah, very nice. I got a few. Just fantastic.

Just following up on your extra real quick. I'm doing a stories, stories from Python history, like through the years panel and Barry Warsaw is going to be on there on the sixth, which is what four days, Thursday, something like that. Okay.

Okay. Yeah, that pep story. I want to try to get him to talk about that. So my extras, this one is certainly could be a full fledged item, but I don't have enough experience. I don't really know. But here's the deal. So you could have some kind of SAS bug system could be JIRA. Everyone loves JIRA.

or it could be GitHub issues or it could be something else. But what if you just want something low key, you know it's gonna be right there with your projects, Git is already distributed. So there's this thing called Git bug, okay? And the idea is this is a distributed offline for standalone issue management tool that embeds issues, comments and more

as objects into side, you could get repository. So when you get clone, you just get the issues. And when you get pull, you update your issues. Oh, cool. Interesting, right? It comes with some kind of UI. I haven't played with it that much. A CLI, a 2E, or even a web browser that you can run based on this. So that's pretty neat. And then there's something about it will sync back and forth with things like GitHub issues and so on if you want it, or GitLab using something called Bridges.

I think that's pretty cool actually. I don't know how it works, but it seems pretty cool. Yeah. But I have not used it at all and I have no interest in using it because for me, I just, all my stuff's on GitHub. I'm just going to use GitHub issues. It's just fine. But I can see certain use cases. This would be pretty neat. What else have I got? A follow-up from last week. There's our face, but this is a

This is from Neil Mitchell. Remember we talked about Pyrefly, the new type linter, mypy-like thing from Meta? Yeah. And I said, oh yeah, it's kind of like TY or formerly Redknot from Astral. And I said, oh, but Astral has this LSP. So we got a nice comment on that show from Neil. It says, hey, from the Pyrefly team here, thanks for taking a look at our project. We do have an LSP IDE first and LSPs

are approximately synonyms nowadays. And we're exploring whether this can be added to PyLance. So there's more to Pyrefly than I gave them credit for. Very cool. Yeah. Yeah. What else? I think that's it. You ready for a joke? IDEs and LSPs are synonyms? I think the autocomplete feature, the features that make IDEs more than just a basic editor. Okay. Like go to definition, refactor, find usages, that kind. I think that's what he's saying. Okay. Yeah.

Yeah. I mean, I wouldn't fire up PowerFly just from a command prompt. I need to edit this email. Let me fire up an LSP. Yeah, exactly. But I think that's what he said. Like the features of IDEs are basically just back in or front ends to LSPs.

All right, are you ready for a joke? You did a nice job full circling your whole experience. So I'm going to do that as well, like with your PyTest and so on. So check this out. We're all, as programmers, aware by now, surely, that AI tools are good at writing code, and it's going to mean some sort of change.

for us somehow, right? Some of us are worried that, you know, maybe this might be the end. So the joke is there's two programmers who are super, they're like about to be hung in the gallows, right? Like it's pretty, pretty intense. It says programmers worried about chat GPT. And then they look over at the other guy. He's also, he says mathematicians who survived the invention of the calculator. Looks like it was first time, eh? Yeah.

That's pretty good, right? Yeah. That was pretty, pretty good. Yeah. I wonder if that image was made with chat GPT, that would be a sweet sauce on it. Probably not, but it's probably from a movie I haven't seen. Yeah, probably. Maybe. Yeah. First time. The mathematicians have survived. Yeah. Gosh. We're,

show my age, but I still remember all my math teachers saying, you have to learn how to do this by hand because you're not going to mark around with a calculator every day. Yeah, you're not going to, are you? Wait. Well, you know, maybe. Maybe. Maybe you hold enough to your phone. Yes, I will. Or your watch or whatever. Yeah, my kid's oddly good at pointing her phone at anything, like a math problem and getting an answer. She uses it to check her work, which I think is good.

At least that's the claim. Yeah. And, but we live in amazing times, but also interesting times as the curse slash quote goes. Yeah. I'm going to have to read the comments to figure out what async and banana suits mean. Oh yeah. There was a banana suit. One of the videos on Python. Yeah. Yeah. Henny out there pointed out that from one of the pulling out some of the talks, one more talk I thought was interesting was about,

Pablo Galindo, Salgatos, and Yuri Selvanov. It was especially fun talking about async and wearing banana costumes. What could be better? Nice. Well done, guys. Yeah. Very nice. Thanks for being here, Brian. Thanks, everyone, for listening. See you. Bye. Bye.

#434 Most of OpenAI’s tech stack runs on Python 29:01 Share

Python Bytes

Deep Dive

Shownotes Transcript

#434 Most of OpenAI’s tech stack runs on Python