We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The role of AI in software testing - Anthony Shaw

2025/4/25

Shownotes Transcript

Got a plan. Let's do this. I am a really good tester and I want to create amazing tests. What should I do?

Welcome to Test and Code. This episode is sponsored by Porkbun.com, named number one domain registrar by USA Today from 2023 to 2025. Right now, Porkbun has a special low price of less than $6 for the first year of your next .app or .dev domain name. Ideal for developers, web designers, engineers, or anyone in tech.

.app and .dev domains are perfect for your latest projects. Showcase your world-changing application on .app and use .dev for your blog, documentation, and more. As always, every Portman domain comes with lots of freebies like SSL certificates, Whois privacy, URL forwarding, and hosting trials.

With deals like this, backed by personalized support from real people 365 days a year, it's no wonder so many tech people and enthusiasts are making the switch from places like GoDaddy and choosing Porkbun instead. To get the first year of your next .app or .dev domain for under $6, visit porkbun.com slash test and code 25 or click the link in your show notes.

Hello, Anthony. Hey. Hey, Brian. So welcome to Testing Code again. I was just looking it up and it's been a long time. Unless I got this wrong, I think the last episode was episode 101, February 19, 2020. Awesome to have you back. There's been a lot of changes in the last five years. AI is one of them. It's one of the things we were going to talk about today, right? Yeah, it is. Do you want to explain how this ended up happening? No, you explain. Okay.

Yeah, so you made a comment on one of the podcasts that I'm overly optimistic about AI. Let's put it politely. And I said, I'd be happy to talk to you about it and actually kind of show like what role AI has in testing and kind of what mistakes people are making with it, where people are overselling it and actually just explore a bit more. So I thought it'd be a good discussion.

Yeah, and I actually have seen a lot of that sort of stuff promoted, but I haven't played with any of it yet. So that sounds great. So how do you want to start? Yeah, so I think maybe we can start with talking about the...

The number one testing framework. If you remember the Python survey, what's the number one answer to the Python survey for which testing framework do you use? I'm not sure. Was it, I don't use testing? Yeah, nothing. Yeah. The number one testing framework for Python is nothing. Or I'll get around to that at some point. End users?

Yeah, I think it's important to understand why that's the answer. And I always felt like with Python, because I do a lot of open source work and read a lot of open source projects and stuff like that. I feel like if you compare Python to JavaScript, then I feel like 70% of the JavaScript packages I read don't have tests and 70% is being generous. Whereas more Python projects I look at have some kind of testing.

And why do you think that is? Why do you think Python has more? It's still not enough, but... Well, I was thinking of JavaScript stuff. I'm guessing that people try it out. They just try it in the web browser. Or maybe the project that's using the JavaScript package has tests. I don't know. Testing is hard and people don't learn how to do it. Yeah, I think it's the last one. I think it's the latter. I think testing is...

Testing is easy when you know how to do it. And it's something you don't really think about. But the first, I always feel like the hardest bit is writing the first test, like the scaffolding of the test. And only once you know how to write tests properly, do you then consider when you're writing the code, how am I going to test this? Because when I look at projects that don't have tests, and I ask the developer, like, okay, can you create some tests? Or why haven't you created any tests? They're like,

oh, but how do I start? Or like, how do I even test this? And then you look at the code and it's written in such a way that it would actually be quite cumbersome to set up a basic unit test. Like a, let's just test a single happy path. We give it some inputs and it gives us an expected output. Like how do we just do that as the first test is quite difficult because people didn't ever consider having to write tests in the first place and they just kind of wrote the code however was easy for them.

Yeah, but, but like they must know, I mean, a lot of people have some idea of, of that it's working. So like when I talk to people about that, I just say, well, how do you know it's working now? Um, put that in a test, but, but I, I'm usually thinking about API level tests or system level tests, but, uh, and there's a lot of bad information about test writing to the, the,

this isn't the topic of this show, but the thing I really can't stand is people that say, okay, we're going to start testing. It's going to take you twice as long to write your code now, but there are benefits. I'd like to hit those people. Um, it shouldn't be taking you twice as long. It should be taking you about as long as it takes you now. You'll just debug less. So, but yeah, that peeve of mine of, uh, mostly the unit test test pyramid people, but,

anyway yeah i'm not sure about yeah the pyramid thing is interesting but i don't really follow it i think it depends always on the project and it depends on like when you're writing the software i feel like you you kind of have an idea which parts of the project need particular attention like which bits are flaky like if i write i don't know 20 000 lines of code not every line of code is equal

right i wouldn't say okay my goal is to get 90 coverage although that is sometimes useful but i wouldn't say my goal is to get coverage of every single line equally because some of it is not stuff that is potentially going to be brittle anyway and then there'll also be parts of the code like there'll be a function in there where it's a little bit complicated or it's

It's calling things in a certain way or it's got lots of expectations or whatever. And then I know that I should be focusing more testing time on that. So like how does that handle different edge cases and stuff like that? The other thing is a lot of teaching. Like we teach people to build cool things by plugging a bunch of stuff together. And we teach testing by how to test a function. But a ton of code isn't functions. It's hooking things together.

And that's, I also think that like the education system, people don't, I don't think I didn't get taught a lot of testing in college and I still don't think people get taught that much. And I think it's not because that we shouldn't. I think it's because I don't think the instructors know how to.

So testing is the thing that we think we need or we know we need, but we don't have enough tests. Everybody figures they probably ought to have more tests. And so I think that's possibly why some of these companies are thinking AI to the rescue. We can have AI write the tests and then we don't have to. Maybe. I don't know.

Yeah, I was working on some requirements and writing like a long list of, like a long spec and giving it to an LLM and saying, can you create this for me, create this code for me? And then at the same time, I wrote a test suite that basically would test whatever code AI created and then test all the things that I put in the spec. So it was like, it was a really simple like data reflection class. And I said that, you know, there should be a name and an address and

like a unique ID and that field should not be changeable. The address should always have two lines and the name should always, it should never be empty. It should always be a string and it would create all of that code for me. So I'd give it one big list of instructions. It would create the code for me and then I'd handwritten the test for it at the end, just as a way of comparing like these different AI models. And what was interesting, like all of them got,

A bar one, I think one of them kind of got a bit confused with one of the requirements. They all created slightly different, but they all kind of spat out the answer. But really what it was producing was like a very like entry level, like my first engineering job, like level programming. It wasn't like a complicated thing that I'd asked it to do. And I saw a video after share a link with you because it just made me laugh. It was like,

There are people now who, instead of even writing the spec for the AI, they ask the AI to write the spec for them. And somebody was doing a demo where they said, oh, I want it to create Angry Birds, but the AI works better when you give it a detailed list of requirements for what exactly you need. And I can't be bothered to type that in. So it said to the AI, can you describe the game Angry Birds in terms of a software spec?

And it did it for him. And then he copied and pasted that back into the AI and said, can you create this project for me? And it went and built a mobile app called Angry Birds, which I'm sure breaks some sort of copyright. And the brilliance of the video is that if you just paused it before the demo, you'd be like, this is amazing. This is going to take over the world. This technology, like as engineers, like we're really in trouble now.

But it's just so funny because you only actually watch the demo. It's like the worst thing I've ever seen. Like it pops up in the title, like it had a menu and stuff and a game. And then you go into the, you click on a level and like the little pigs, I don't know if you've ever played Angry Birds, but you kind of like, you have a catapult and you catapult birds at the pigs. Yeah.

And they blow up and stuff like that. Yeah, but the pigs just kind of fell out of the sky dead already. The bird's beak was on its chin. The catapult, he tried to sling the catapult, but the arms of the catapult, the physics were just completely wrong. So it just didn't make any sense. And then he eventually managed to kind of fling the bird and it just crashed. And then there was the end of the video. He's like, ta-da! Yeah.

Okay, but it's that far. Maybe it's easier to fix it from there. I don't know. Yeah, it is. And it's like, does that help you? I don't know. That scaffolding, because if you ask me, how would you make a game for a phone? I wouldn't have any idea how to start. Whereas at least that's given me something to go on. It had a menu and you could press it and it launched on his phone. I don't even know how you would do that.

But it's interesting because it gets you started. And a lot of the times when you're working with a technology that you're not super familiar with or you don't use every day, the hard bit is just getting the first prototype running. And then once you've got something running, you can kind of iterate on it.

That actually is the part that actually scares me a little bit because I, the, the thing I feel like I'm using AI tools for is the stuff I don't know about. Like I had it right. I needed a PowerShell script and I'm not a PowerShell person. So I had like described what I needed and I got a script out that worked. So I was happy. However, the stuff that I know about like PyTest stuff, when I ask,

ask AI to do stuff for me, it's totally wrong. So my worry is the stuff that I'm happy with, I just don't know enough about it to know that it's totally wrong. Or that it's inefficient or ugly or bad code or whatever. So I think we're going to have a lot of code that works, sort of, but is bad. I generally agree with you. And I think if you don't know how to use the tools...

effectively or if you just blindly accept what they're suggesting then it's not good quality and similarly like stack overflow i think is is a bit better because at least like if you ask a question stack overflow and there's an accepted answer because that's kind of what i compared like the chat iis to mostly is stack overflow because for developers like if you don't know how to do this

The old way, and still is the current way, is like you just ask the internet and it generally tells you to go to Stack Overflow and then you go there, someone else has asked the same question and there is an answer and you copy and paste the answer into your solution and you see if it works and then if it does, you ship it. I feel like that's what most developers have done for a very long time and only when you get really experienced developers

Do you read the answers and go, I wouldn't have done it like that? But you're not even asking the question because you already know how to do it. Well, so one of the tricks with Stack Overflow, of course, is to not pick the accepted answer because the person that asked the question is the one that accepts it. It's often the answer with the most upvotes. So other people went, yeah, that seems hinky and they've upvoted some other answer and that's usually a better one.

The mistake I've made before is copying the code from the question. Because I was really tired. Yeah, because sometimes they're really complete questions with code examples and then right at the bottom it says but this doesn't work. And yeah, that's funny. So one of the things that I one of the tips that I got from somebody recently or at least I heard, maybe I heard it on a podcast, was to ask a question and

Into the AI and whatever answer it's giving you, it's going to give you a lot of terminology that's enough for you to find the document to be able to understand the documentation then. And they used actually PyTest as a, an example, uh, because if PyTest is heavily uses fixtures, but if you don't know what a fixture is, that word doesn't mean anything to you. Yep. So you wouldn't even know to look that up. So yeah. Anyway. Yeah.

Yeah, so I thought what we could do is pick one of your projects. And we've got one which is called Cards, which is... Didn't you do an episode series on this years ago? Probably. And then also it's heavily embedded in the current PyTest book. Oh, yeah, yeah. I've got both. So I thought what we could do is pick this...

project and i've got it on the screen but i'll be descriptive for the podcast and i'm going to get the ai to write some tests for us because i think that's kind of like we'll talk about this but what role does the ai have in testing because i feel like a year ago when this technology was definitely a lot less mature and more rudimentary

One of the main lines I would hear, and I myself as well said this, is like the AI is good at writing tests. And I've been using this like every day for the last, I don't know what it's been now, like a year or two. It feels like a long time, but it probably wasn't that long. Whenever it came out anyway, whenever the early versions came out.

I've been using it like every day and I've slowly been going back on some of those statements and saying, well, it kind of depends. And like, yes, you can get it to do that, but you kind of need to know what you're doing. And it goes back to your stack overflow point, like, or just generally with technology. If you're asking for help from something, whether that's the internet or the AI or the person sitting next to you,

you kind of need a level of knowledge to know whether the answer they're giving you is good enough. This is more philosophical than technical, but like, do you just blindly accept what they're suggesting? And the danger with the AI is that if you don't have to ask the question correctly, then the answer it can give you could be poor or wrong or buggy or all of the above. And yeah, that's one of the big, big challenges. Or just old. Yeah. Oh yeah. That's, that's another common issue.

So how are you using AI usually? Are you punching in questions into an interactive thing or are you using Copilot? Yeah, so I rarely use the chat UIs. So there's a few of those. There's like

ChatGPT is the most popular one by a really long way. And I've seen, I've watched developers who've got a browser open on one screen with ChatGPT and their editor open on the other screen. And they're basically just constantly going backwards and forwards between ChatGPT and the browser and just copying stuff backwards and forwards.

Copilot, like GitHub Copilot, is built into VS Code and it's built into Visual Studio and PyCharm and now the Java one that everyone uses, the Eclipse as well.

I feel like pretty much every IDE has one now. It has a chat box somewhere. And then Cursor, it's just sort of built into Cursor, right? Yeah, Cursor's like a whole different thing. It's like every button in Cursor has got some kind of AI in it. Yeah.

It's like, short of just moving the mouse for you, it pretty much does everything. So when you run tests in Cursor, if one of them fails, it will suggest that it fixes it for you, which...

It kind of again goes back to if you don't understand this technology enough, then that can be quite dangerous. Like if it fixes it for you, it could just be like removing the assertion from the test. I was just going to say, hey, it's this nasty assert right here. That's what's causing the problem. I've made it pass. Or I've seen some like PyTest extensions where like they're joke ones, but like they just like PyTest YOLO or something. It just like lets all the tests pass.

Yeah, it generally doesn't do that. It looks at the error messages and then tries to guess based on experience what's happening. So yeah, I feel like we... So I mostly use it for code completion, which is... So here I've got... I'm in VS Code. I've got code completions enabled, which means as I type...

it will try and it will basically like finish my sentences for me. So I don't know, like it's, I just typed def, which could be anything. And it's assumed I'm trying to write a function called get card state. We're looking at your cards project. There's an API in here. I don't know why it's come to that conclusion. It's oddly not that annoying when you get used to it. You just sort of ignore the gray if you, unless you're curious, but.

Yeah, and this is also... I know I wanted to turn on pilot rules. So that's kind of like code completion that we've had forever. When you call a function, the fact that the editor tells you what parameters it has and suggests them for you, that's a really helpful feature. And when we originally had that feature...

There were a lot of people actually complained about it, like this is cheating. That's awesome. It's not as helpful. I don't want to have to go and find the source code for the thing that I'm calling and look at the parameters. And if you put a doc string in a function and you describe the parameters and what they are, for example, or you put type annotations on them, the whole point, well, the vast majority of times we use type annotations in Python now is this documentation.

So we've kind of had the ability for when you start calling a... It's like if I've got... Okay, so in this one, we've got a data class called card. And if I just made a card and I call it like demo card equals, it's probably going to predict that I'm going to create a card. So the reason it's doing that is because everything before the cursor is context for what it thinks I'm trying to do. So if I then say is card, it should then...

try and guess what parameters are based on what comes before the cursor so like you've got a data class here called card it's got summary owner state id it's looked at those and then kind of predicted that because i've called it demo card then i just want like demo inputs

So like that's, it's actually not too bad. It picked demo card demo for summary demo user for owner. Yeah. So like it's, it's got a number and then if I told it what I wanted to do, like when I say from diction, like also create demo card from decked.

like if I say that that's what I want to do, that it will create a dictionary that's basically the same parameters that I just had before. So again, it all becomes context. So you're kind of stacking it one on top of the other and then the next line will be, okay, and then call the from dict function. So the method, sorry. So like if I was writing a,

if I'd written some code and I want to start using it to validate it, normally the first thing you do is you kind of like arrange your test data, you act on the code and then you, what your output should be. It's like the AAA pattern in testing is like what pretty much every test looks like if you boil it down to three components. And so the AI is kind of able to look at the code, the context before what I was writing and say, okay,

You've got a type called card. It's got these fields. It's got these methods. And therefore, if I, the instructions, like what you would normally type into chat GPT, you can just write that as a comment. So like the way you steer these AIs to do what you want is you just, you just overly verbose with comments. And then if you do that, it will just populate that for you. Like I find this helpful because I,

Like, it's just cumbersome to write this stuff out. This isn't like difficult code. It's just called a, it's like constructed an instance of the class with some demo data. I could do that. An entry-level engineer could do that. It's pretty obvious. There's nothing clever about this. It's just convenient.

Right. And I actually found the, the comment thing on accident. And that's one of the cool things is you can just accidentally discover these things. I, I wrote a comment for, I wrote the, like the start of a function, wrote a comment for what I wanted to do. And then like the, the,

the suggested code was pretty close. I usually don't have the code be exactly correct, but occasionally that happens. But usually, or often, it's close enough. Yeah. So I think where these tools are good is where everything that comes before the cursor is correct and relevant. And if that is the case, then what it's going to predict is usually pretty good.

If you, on the other hand, start with a blank file, and let's make it Python, and I make a comment that says, like, download the internet, like, what it produces is probably just going to be nonsense. Let's see what it comes up with. Come on. Come on. Dev. Download. Okay. I don't even know. Is this going to keep going? Like, if I just keep pressing tab...

Like it's writing a program. I don't know where it's going. Wow. Okay. So if we can then go back and go, okay, what have you actually done here? It's downloading some files. I don't know what it's doing. It's just, it looks like a program. It looks like it made a downloaded files directory. It's looking through what it's joining the path. Huh?

um this this this code may or may not work i don't know but is it yeah i don't think this is going to do anything useful but yeah but if you if you asked it to make a program that downloads the internet like i don't know where how would use context but the important thing is here that like what there was nothing before the cursor so it's not i think people misunderstand that like it's somehow like reading and understanding your whole code base so it's like

Most of the time it's not doing that unless you tell it to.

It's just looking at what is on the screen and often what are the tabs you've got open in the editor. So if you do split screen between two classes, it will kind of look at both of them. So if I have more tabs open, it will do better? Yeah, generally. Oh, cool. Also, I didn't know about the cursor part. So even if I want a function in the top part of the file, it might be a better function if I write it at the bottom of the file first and then move it?

Yes, annoyingly. Interesting. These are great tips so far. Yeah. And there's other things like it will do, like if you make a mistake, then it will kind of propagate that mistake. So if I go like with open, I was demoing this the other day. If I just do with open, it assumes I want to write to a file called foo.txt and it will write that for me. If I add an extra parameter on here that's wrong, like

bananas equals two then like the next time i it will just propagate that error so what i was saying about the cursor like it's now i've made another function i've made one called foo which calls open in a in a with context manager with an extra parameter called bananas which is wrong like there was no such parameter and then when i do another function called bar it's like hey let's do the same thing with bar with two bananas

So... I'd argue that bananas should be an argument to open. Oh, and now it wants baz as well. And it's going to do the same thing, right? So this is more like if you ever use Excel and you select two rows in Excel and then you drag them down, it will just look at the pattern and basically copy and paste the pattern. Yeah. So...

If I just wrote a line that said banana. Well, copying mistakes is bad, but the whole pattern thing is kind of nice. If I do have to do repetitive stuff, having it do mostly the same thing again is kind of cool. Yeah, so if I just wanted to write the word banana again and again, it would assume that that's what I'm trying to do and it would just keep doing it.

So that's kind of where it gets to. And so sometimes with these, it will get stuck in a loop of... They've improved, I think, over the last six months, but sometimes we'll get kind of like... The things that come before the cursor are actually a repetitive pattern, and it will think that you just want that pattern again and again and again.

One of the common ones was import statements. Like it was like, oh, you've imported five things from that module. Therefore, you just want to continue importing things from that module forever. It's like, no, that's not what I want to do. If we took our example with a card here, like, okay, so we've got a demo card, which calls the constructor on the data class. We've given it our parameters and it wrote that code for us.

And then I said in a comment, so I told it what I wanted. I said, I also want to create a demo from dict because your class, I saw you've got a from dict class method. And then the other thing I want to do is I want to check that the to dict, the output of that matches the dictionary that I gave it when I called it with from dict.

Yeah, that'd be a good test. Yeah, I reckon it's going to predict that I'm going to do that next. Oh, close. Okay, it's predicted that I want to assert. I'm checking some of the statements. If I tell it, check that to dict and from dict are working. Oh, there we go. It got it in the end. I just gave it a comment and it knew what I wanted to do. So now it's got an assert saying assert that when I call to dict on the card that it created, that is the same as that one.

However, and this is where we can talk about some of the nuances here. Can you spot the mistake, Brian? Well, I mean, other than... I don't know. I mean, I think that test would pass. Well, it's not a test for one. It's just that... Yeah, yeah, yeah. It's just like... But it's checking that one, not the one I made from the dictionary. Oh, right. So, like, instead of doing...

But that's hard to spot because it kind of looks okay. It should be doing that instead. So it should be testing the one I made the second time around. So this is like the other trick is that if you, if when you're writing tests, if you write one that does tries and it does too many things, then the code can get confused. So like depends on the model, but this is, I think I've got this configured on like the most advanced model we have at the moment.

But it uses context for everything before the cursor, and therefore if what is before the cursor in the encapsulation. So I'm not writing this in a function, which again is not realistic, because you would write this in some sort of function normally. But if I did that and I described things better, it should give me better answers. But there are definitely times where it can suggest things which are wrong, but are really hard to spot, and that's what's tricky.

And actually that test would have passed because the data that it gave to both those cards is the same. Yeah. Can we get to the part of testing? What stuff is AI? What are people saying that it can do for you for testing? Yeah. So the thing that definitely I've seen this demo a hundred times and I'm going to pick on your code. I'm not going to pick the dates class because that's too easy. There's another one here. You've got a class called cards DB that's got

initialized method. It's got some methods on it. And I'm going to ask it to generate a test. So... Yeah. So there's a couple of ways you can do that. You can kind of do it inline so you could be like, can you make tests for this? Or you can say in the chat. And they do have a thing called like for slash tests now, I think. Or like setup tests and it like creates them for you. I've never really used this. Tests for the...

cards, DB class. It will look to see what testing frameworks that you've got configured. I'm going to tell it and it asks you where the tests are. This is a bit more advanced, this one. This is like this feature basically actually looks to see what testing framework you've got instead of just generating tests in any random test framework. And it will put something together. I don't know what it's doing. Oh, okay. It's installing stuff.

I think it's kind of configuring the tests now. It's like configuring test suite or something. And then it should, when it's finished, spit out a file somewhere. Curious how well it'll do. Yeah. I don't know what it's doing. Sometimes you have to ask it twice. Is that my children? It's like, they start doing it and then they get distracted. And then you're like, did you do the thing I just asked you to do? And then it's like, oh yeah, sorry, I forgot about that.

So if I just write it in words. God, Dad, you already asked me. Yeah, I'm doing it. I'm just replying to my friends. So if I say create tests for the cards DB class, it's saying, what is it telling me to do? It's a plan. It's got a plan. I've got a plan. Let's do this. Whoa. Okay. There you go. It's thinking about it. This is his plan. It is. I think it's finished.

The models stream output, so they're kind of right as they go. So you have to kind of wait for it to finish its answer. Okay, it has created a set of tests for your class. We can talk about this. It suggests putting it in the source folder, but it doesn't really matter. So I can just click on this button and it will just paste that in. I don't even have to copy and paste it, does it for me? On Tinder...

or tinder hooks i never knew which one that was it's it's not the one you see i always get it wrong as well it's like and it has something weird like old like british meaning or something it's like it's like oh yeah the hooks i used to hang your hat on okay like so let's look at this what has it done and this is where we can nitpick so um okay let's nitpick this so it is creators in texas

Now, something you always want to check is, did it cheat? Look at the existing test code. Exactly. So, did it just go and look at the existing test code and just copy and paste it and go, look what I made? I don't think so. Yours would have been in, where would it be? See, yours is a very, no, it doesn't look like your code. Yours is a very well-structured.

It was a couple years ago and I wrote it. It probably isn't the best. We'll see. Okay. So it is put together. Let's work with the actual class on the right-hand side and the tests on the left. So we don't need that comment because it's self-explanatory. It created a fixture. That's cool. It's created a fixture. So I like that. Often it will use the information that it can gather about the project that you're in. So it's like,

you're in python when i said like create tests it didn't spit out javascript tests so i'm like well done ai you got the right programming language but we'll also like look at to see like what other things you're trying to do and like oh it's like it's a pi test project or there's pi test in there somewhere therefore let's make a pi test test module

It's also looked at like the implementation and it tends to do them in order. So it will look, and I was specific as well. Like I create tests for this class. I didn't just say, yeah, create some tests because with AI, like the more specific you are with the instructions, the better the output.

and so if i said create tests using pytest and use fixtures where appropriate and create multiple test methods for each like thing you want to verify and also create positive and negative tests verify that it handles exceptions correctly verify that it handles different types other than the ones which are annotated like you could give it a long list of things that you want but i didn't do that i just said make some tests so like

Before you nitpick the output, it's important to remember that if I told an engineer, make some tests, you've got till two o'clock. What would they come up with? And it depends on the person and what time it is. If it's 1.45 and I give them 15 minutes, they're probably going to make something pretty basic. If it's nine in the morning and they've got five hours to do it, then they should make a pretty decent test suite. Yeah. Yeah.

So like I'm taking a look at the first, like the first test was add a card and it looks like it create, created a card, added it to the called ad card and then got an ID back and made sure that the ID is not none. I would say it hasn't actually checked to see that the card was actually added, but just that the ID was returned. But so not the best test, but not awful. I guess you're going to exercise the ad card function at least. So.

Yeah, and it's, I guess it's got two tests for test add card. So, and again with the input, another trick with AI is to give it a number, give it a goal. Like if, same with the make B test by two o'clock. Like if I, in this one, if I actually said, and this is a bit weird, but if he said, I'll tell her which file I'm looking at, make B,

write six tests this is a really weird trick with the ais if you give it a number it gives you better outputs um for the like add card on the card steamy class if i do that it will should have a think about what it's done and it's going to suggest more tests and they will test more scenarios

And so if it compares now, like, oh, this is what I did last time. And this time I'm going to do this. Okay. So it doesn't create a full new test file. It's going to patch it. Yeah. This is like an edit feature. This is the kind of difference between copying and pasting stuff out of ChatGPT and just having it in the editor is it will actually kind of like inline do it. So you don't have to work out the differences. So yeah.

here we've got like add card and it's kind of come up with some different scenarios that it wants to test but if only because i told it to like if i say make a test it will just make one test and it will be like the simplest vanilla test you could think of if i say make six it will get a bit more creative um yeah hopefully six covers everything yeah i don't know what the magic number is

so this i guess this feature that i'm showing you is like it making the test for you i i generally don't think this is a good idea um and i reckon you're going to agree with me so why would you think this is not a good idea well i mean the first well partly the first test i looked at it

sort of tested something but it doesn't really test the the all of the conditions of the correct answer yeah so if you looked at the function the method it's got like if no summary raise an exception oh i did i should get a right test for that like it said let's verify that if

Yeah, it did verify that. But then there's conditions around whether or not the owner is supplied. Yeah. And it didn't test for that. It also doesn't really test to make sure that the stuff actually ended up in the database. I'm curious, actually, to see what happens if I run the test. Oh, you've got some extra parameters. It looks like a PyTest Coven installed. Yeah, yeah.

Yeah, I'm curious to see what happens if I run the test that it made. Like, do they pass? I'm bypassing your entire test infrastructure and just calling them directly, but it might work. No, it needs more stuff. So yeah, it didn't test that. Also, the tests are pretty basic. Like, what it generated at a glance looks fine, but...

That's the part that I'm worried about, that people relying on this to write their tests, they're going to look okay, but not really be the right thing. Yeah. So I'd argue that this is better than no tests. But that's a pretty low bar, right? I don't know if it's a low bar. Actually, I think tests that at least exercise your code is...

better than nothing. But right, like you said, that's a low bar. You're not making sure that it's working correctly. It just doesn't blow up. Yeah. So this is like scenario use case one is that AI can write your tests for you and

yes, it can kind of do that, but it will give you a very basic test suite. And you're skipping like the important part, which is actually thinking about what you want to test. Like you're skipping the thought process and you're just letting the AI do that for you based on other code is seen in the past. Yeah. Well, so can you ask it like given this stuff, what sort of stuff should I test? Exactly. So like that's the scenario I call like

don't hands off the keyboard it even suggests actually it's like hey let's generate tests for this i think if you use that feature then it gives it a better prompt they've been fine-tuning this so like that doesn't have fixtures yeah it's a similar thing it's making some tests but because i only picked one method instead and make tests for this single method then it's created one two three three tests

And then it got bored and stopped. So like if I tell it, if I, if I tell it to make 10, it'll make 10. If I don't tell it how many, it will make like two or three and then it will just get bored. So you generally want to be specific with like how many of those you're expecting. And if you ask it a question, um,

So like if I ask Copilot, if I just clear the chat as well, this is another thing. This is another trick that's really important is if it's going down the wrong path, don't keep asking it different questions differently because it uses its conversation as context. Like if it gives you, it will use wrong answers as context and say, we'll just kind of keep going down a path that you don't want it to go down. If I said I want to write this,

some tests for cards, DBs, card method. I am a really good tester and I want to create amazing tests. What should I do? So is that I'm an amazing tester sort of thing? Is that stuff important? Unknowingly, yes. So I know I can show you another trick, which is because

At the moment, the way that they're built is that it will use... It doesn't know context. So, like, I'm a noob. This is my first Python program. I want to create a test. Like, that as a question is different to someone who has got a lot of experience with Python and testing asking the same question. That's true, I guess.

And so if you're a beginner and you are like, I want to make a test, it wouldn't be like, okay, let's use fixtures and parameterized tests and let's like test all these scenarios and let's introduce all these new concepts. You'd be lost immediately. Whereas if I ask it, the issue I had with it just generating tests for you is that you just skipped the thought process and it gets you to a green tick in PyTest. You might have missed some important things.

Whereas if you ask it the question and in my question, I've kind of given a hint as to what level I am. Actually, that's not what level I am. You, I'm not a really good tester, but if you kind of give it that as context, it's

Then it will give you a better answer. Because if I just say I want to write tests, it will give you some like textbook computer science test. So all about verifying. I actually kind of like this output. So it's not overly verbose. It's good. Yeah. So it said, let's create one with a valid summary. Let's create one with a missing summary. And let's assert that the exception was raised. Let's create one with a non-owner. So that's the other condition you spotted.

let's create a card with a specific owner and it's kind of following the same like the AAA pattern and it's saying let's create one

like for test scenario four let's create a card with a specific owner add to the database fetch it from the database with the id and then assert that the owner fetch of the fetch card matches the specified owner so i'd argue that like test three and four are actually probably the best ones like they're actually checking that when you add a card to the database it actually adds it and when you fetch it back out again it gives you kind of what you wanted

So even though I didn't ask it to, it's gone and made those four bullet points into tests. And test scenario three and four are test methods. So it's got one here, make one with your own, add the card, fetch the card and check that the owner is empty.

So it's kind of saying, if you don't specify an owner in your code, if the owner is none, then make the owner an empty string. It's actually written a test for that. And then the final one is it's gone, okay, let's create a card, let's add it, let's get it back from the database, and let's make sure that the owner actually matches what we put in the input at the beginning. It's not bad.

Yeah, so I like this. So the model really isn't write test for me, but it might write test for you better if you don't ask it to. If you say, what test should I write? Yeah, and you tell it that you're like... I'm Brian Ocken. I'm writing. I'm going to also see what happens if I...

So it kind of, it used this slash tests command behind the scenes. And it also used some of the things I'd included in my prompt, which is like, what level am I at? What are my expectations? And I said, like, I'm a good tester and I want to create amazing tests.

So it's, it's, you shouldn't personify the AI, but it just makes it easier to do that. It just makes it easier to explain things sometimes when you do. I think it's like, it's trying that little bit extra harder. I think I might do this, the prefix of I'm an experienced developer more because sometimes when I just want a quick answer to something, I just pop in a quick question and I get like an encyclopedia entry. Yeah. I hate that. Yeah.

It's like the guy at a party that is really into something and you ask the wrong question and it's a wind-up doll.

But anyway, would you recommend people do this then? What test should I write method? Yeah, I think so. I like this. I feel like if you're not writing tests today or there's some piece of your code that you want to go and create some tests for, then this helps you get started. So I'd call this scaffolding, test scaffolding, where you've got nothing today.

And so like, if I can just, if I delete this and I paste that in, so they're going to chuck in the test that it made. And then we save that. And you'll have, you will have to pip install tiny DB. Okay. Okay. And hopefully they haven't changed their database again. So yeah, that's kind of like the holding use case.

And then the one that I use it for a lot more than this is I am in the middle of a project. Actually, I'm going to go and pick your test code because I think this will be better. Okay, so we've got, this is where we do completions. Let's see if we can go to run our tests. See, this is where I should actually be just installing your requirements instead of just guessing them one at a time.

Well, that's one of the reasons why I might rewrite the book is so that I have a project with no dependencies. Okay. So in this one, you've got test. What are you testing? You're testing that if you add a card and then you run the finish method that the state becomes done. That's what I'm understanding from this. Yeah. Yep. Are there any scenarios that we missed?

Is there any additional tests that you think we should add here? Well, I don't because I intentionally made it 100% test coverage and behavior coverage, but

Yeah, okay. So the other scenario is like as you're writing tests, it predicts the next test for you. Oh, finish twice. Yeah, cool. So I'm just going to go down this path and see where it leads us. So now the important thing is here that I haven't done anything other than press enter and it thinks I want to make a new test called test finish twice.

And it knows that I'm going to probably want the cards DB fixture because that's what the other tests do, right? It's going to write the doc string for me. And should you be able to finish a card twice? I don't know, Brian. Well, I made the decision that you can, but it's actually tested above that you can. Oh, okay. Finish. Oh, like if it's already done and you do finish, then it should stay as done. Yeah. Okay.

I should have picked somebody else's code that wasn't Brian Arkin. But actually, one of the reasons why I did that is because I wanted to talk about when writing tests is a great way to think about your requirements. This is a question about the application. Should you be able to finish something twice? Don't know. And should it raise an exception if you do? Yeah. So, like...

Let's see what it's done. It's assuming that it should raise an invalid card ID if I try to finish a card. Second time. Yeah. Well, I mean, if that was the decision to not be able to, that'd probably be correct. Yeah. So yeah, it will generally do things like this. Like if it uses the context, it looks at the test you've already written and it doesn't write them in a generic way. It writes them in the way that you've been doing so far.

Even with the doc string, that's nice. Yeah, and it will use your, like, the fact that you finished your doc strings with a full stop that time. Like, it copies that. So it just, like, I find this is super useful when you're building out tests and you've kind of written one and then you want to test that slight variation. And that would often require you just writing a lot of testing over and over and over again.

My daughter tells me that ending a sentence in full stop means that I'm angry. Depends how hard you hit the key. Whereas if we wrote, you know, I was telling you the code comment trick, like if you're basically giving it instructions, protest it does this. The other option is that it will know based on the name of the test function what it is I'm trying to do. So if I say test

finish twice is okay, then it will change what we did last time. And it knows now based on the name of the function. So this is like, this is my like super lazy guy to testing. I basically just write a name of a PyTest test with the thing I want to do. And then it most of the time just writes it for me. This might get people to write better test names. Yes. That's a good thing.

That's awesome. I like it. So yeah, it's created a card. It's called the finish method twice. Check the ID and it's checking the state is done. So like that is, I just told it just based on the name of the test, but then what it is I wanted to do and it did it, did it for me. So yeah, I guess a roundabout way of going to the two scenarios. I think this is useful is, um,

getting it to get you to think about what it is you want to test if you ask it and you tell it that you're a super awesome programmer and you get it to think about the testing then it will give you a reasonable answer to get you started and then once you've done that if you either just start a new line it will suggest another test or if you write the name of a test with what it is you want to verify then it will actually implement most of it or all of it yourself this is cool

Before we wrap it up for today, though, we didn't really do an introduction. Where can people find you if they want to find anything more out? Yeah, I'm on mostly Mastodon and Blue Sky these days. We'll leave links. Yeah, yeah. And I think I also have a blog as well. So I occasionally write on that. So there's some stuff on my blog.

I've also got a book if you're interested in CPython internals, the compiler, stuff like that, then you can check that out. Yeah, it's an awesome book. So cool. We'll talk later. Cool. Thanks, Brian.

Thanks to everyone who supported the show via Patreon or by learning PyTest from a course at courses.pythontest.com. I've made a change recently that I'm pretty excited about. The Complete PyTest Course is now the Complete PyTest Course Bundle. It was one big course and is now three courses.

since you really need them at different times in your PyTest journey. Part 1, PyTest Primary Power, provides a gentle introduction up through covering the superpowers of PyTest, including fixtures, parameterization, markers, and more. Grab primary power and get to work. Part 2, Using PyTest with Projects, has strategic topics like debugging, mocking, continuous integration. Part 2 is perfect for when you're applying PyTest to a work or open source project.

Part 3: PyTest Booster Rockets, Explore's plugins, both third-party and building your own,

and advanced parameterization. Although part three can be taken right after the other two, it makes more sense to live with PyTest for a while and then explore what more you can do with it. I've also added new intro videos at the beginning of each course, congratulations videos at the end, and printable certificates. Anyway, I think the new structure makes a lot more sense. These are all at courses.pythontest.com. That's all for today. Thanks for listening. Now go ahead and test something.

The role of AI in software testing - Anthony Shaw 58:07 Share

Test & Code

Shownotes Transcript

The role of AI in software testing - Anthony Shaw