EP 459: OpenAI’s Best AI Agent? The correct way to use ChatGPT’s operator agent

2025/2/11

Everyday AI Podcast – An AI and ChatGPT Podcast

Jordan Wilson: 我认为OpenAI的Operator代理是目前最好的AI代理。许多人错误地使用它,将其用于研究,但实际上它更适合跨多个网站和软件服务执行任务。Operator代理可以访问多个网站,复制和粘贴信息,并登录到不同的产品。它能够处理知识工作任务,连接多个服务,并执行耗时的重复性任务。我建议不要使用OpenAI在其演示中展示的预设提示,例如预订餐厅或购买电影票,因为这些任务人类可以更快地完成。相反,应该利用Operator代理进行基本研究、阅读、写作、总结和数据分析,这些任务无法在ChatGPT或Deep Research中完成。我亲身示范了如何使用Operator代理进行研究,从Google Gemini获取信息,创建幻灯片,并发送电子邮件,展示了其在实际工作中的应用。

Deep Dive

Chapters

This chapter introduces OpenAI's Operator, a computer-using agent built on GPT-4.0. It explains its functionality, access and availability, and limitations, including the need for an active browser tab and the challenges of handling multiple tasks simultaneously.

Operator is a research preview of an agent that uses its own browser to perform tasks.
It uses GPT-4.0 to interpret screenshots and interact with websites.
Currently available to users with the $200/month pro plan, with rollout to $20/month users planned.
Limitations include slow performance with multiple tasks and the need for an active browser tab.

Shownotes Transcript

Translations:

中文

This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life.

There's always been this enormous upside of generative AI and large language models, right? From the early times of the chat GPT moment of generative AI to all these new updates in between. But

I think many of us have realized that potential, especially if you're a daily listener of the show, but I think a lot of the rest of the business world hasn't. There's always those doubters out there that are like, okay, cool, this AI can go do this, but when will it just go and do my work for me? Well, with OpenAI's operator, that's exactly what can happen, and that's exactly what we're going to be showing you today.

All right, what's going on y'all? My name is Jordan Wilson and welcome to Everyday AI. This is your daily live stream podcast and free daily newsletter, helping us all not just keep up with AI, but how we can use it to get ahead to grow our companies and our careers.

because, you know, efficiencies and optimizations are one thing, but when we can actually use them to grow, that's the whole next step that we all need to take. And you can take that next step if you haven't already on our website. So if you're new here, please go to youreverydayai.com, sign up for the free daily newsletter. So we recap, we're going to be recapping this very show, uh,

have a nice write-up with some additional resources, but we're also going to keep you in the loop with everything else going on in the world of AI so you can be the smartest person in your company when it comes to generative AI and large language models. Also on our site.

You have to go check out the 2025 AI Predictions and Roadmap series. It's actually been so helpful to so many people, even though it's from a couple of weeks ago. I think we're going to do some slight updates and run it again because I think it is that important that you all listen to this. So it was five very short episodes on our website at youreverydayai.com. Look for that 2025 AI Predictions and Roadmap series.

All right. Normally we start off each and every day by going over what's new and noteworthy in the AI news. Today's show is going to be a very detailed one with a lot of screen sharing. We're doing live demos. I'm going to literally show you on today's show how Operator is doing my work. All right. So if you want the AI news, make sure to go check that out in the newsletter. Also,

If this show is helpful, I'm going to remind you of this again at the end. Make sure to repost this on LinkedIn or Twitter. I'm going to give you all, whoever reposts this, the complete instruction set that I use inside of Operator, which takes a long time to configure and get right, as well as for anyone that does repost this online.

on LinkedIn. I'm going to be entering y'all into a drawing for a free 90 minute consult so I can help you set up operator for your team, answer your generative AI questions, teach you chat, GPT, whatever it is. And we're going to be giving that away in our newsletter.

All right, that's enough chitchat. Let's get straight into it. Is this OpenAI's best AI agent? And a lot of people are using Operator for the exact wrong use cases. All right, so let me just answer this. Yes, I think this is OpenAI's best AI agent. And people are using it for the exact wrong reasons, right? So,

So operator actually came out before deep research. So, you know, a lot of people just flooded, you know, all the, all the thread boys, I think is what they're called on, on Twitter and LinkedIn. Right. They, they go gather all these use cases and they're like, Oh, you know, operators amazing. It's going to, you know, change the game and blah, blah, blah. Right. But because,

because this was before OpenAI's deep research, a lot of these use cases were two things. One, it was very much what OpenAI demoed, which I think was wrong. So I'll get to that in a couple of minutes. And then it was just doing a lot of research. But OpenAI's deep research came out shortly thereafter. I believe deep research came out on January 31st. So you shouldn't be using operator to just go research. This is an

agent, a very smart agent that can work across multiple websites, copy and paste things, uh, you know, across different products. You can give it credentials to log in, uh, to whatever you're using. So I think people are using this in probably like the

Worst way possible, right? So you have to keep in mind OpenAI's other tools. But I do think, you know, OpenAI has officially said they've released two agents. So one in operator, one in deep research. But I'm almost going to call it like two and a half because I think tasks

chat GPT tasks where you can essentially schedule anything in chat GPT, I think that actually has some agentic capability because when you work with it the correct way, right? And when you use your brain, we taught you all how to do that. We did a great show. I thought it was a great show on chat GPT tasks. So when you do something called task stacking and use the context of the chat, it is a

It does have agentic capabilities. It can have agency. It can make decisions. It can create new things for you autonomously. So, you know, OpenAI will say they've released two agents, one in operator, one in deep research. I'll say it's 2.5 because like I said, I think Chachapiti TAS is pretty much there as well. All right. So let's get into the details.

the definitions first y'all and then we're gonna get to uh doing this live and hey good morning uh good morning to everyone joining us uh so pedro uh joining the show from madrid jason uh from florida douglas rolando uh acham uh harvey castro christopher everyone else michael big bogey face thanks for thanks for tuning in if you guys have questions about operator get them in now uh

you know what? And if we have time, I might be able to run an operator question or two. All right. We'll see. So what the heck is operator? All right. So this is from open AI. So they said it, this is a research preview. Keep that in mind. This is the worst it's ever going to be. So this is a research preview of an agent that can use its own browser to perform tasks for you. All right. So it is first there. This was their first official agent release. And if you keep hearing the word Cuba, all

All right. That's what this is. This is a computer use agent or KUA. All right. So it uses GPT-4-0 to quote unquote see screenshots and then it operates a virtual computer. So it is in a slightly different interface than the normal chat GPT, although it is essentially the same thing. So it has its own dedicated interface.

You talk to operator just like you would chat GBT. It essentially takes a lot of screenshots. It uses computer vision, and then it essentially controls a mouse and a keyboard on a virtual machine. And you can take control at any time.

Keep in mind, there's obviously some limits, right? And I'm going to walk you through some of those things. The virtual machine is not very powerful, right? So if you want to go, you know, render, you know, a video editing program or something on this virtual machine that would normally require local computer power, it's not going to work very well, right? Also, in the same way that my computer will slow down if you have 30 tabs open, so too will...

this operator from OpenAI. So keep that in mind. There's some limitations. You are using a virtual machine. However, it may slow down if you're trying to do too many things at the same time.

All right. Let's talk about access and availability. Well, right now it's available to anyone with the $200 a month pro plan. So that is what I'm using. And OpenAI CEO, Sam Altman did say that this will be rolling out to plus users. So that is the $20 a month plan in the coming months. So there is

No, you know, if that means one, two months, three, four months, eight months, we don't know, right? We could see a very long, like Sora ask rollout where it was eight months, or we could see it drop in a couple of weeks. So right now it is only available for those on the $200 a month pro plan, which is what I'm using. Um,

But like I said, it is going to be coming out at some point in the near future. So how the heck does this thing work, right? Well, according to OpenAI, this is how it works. So it says operator uses a model called Computer Using Agent or CUA built on GPT-4.0 to interpret screenshots and interact with sites using typical browser controls like a cursor and mouse.

You describe the task example, book a flight, order groceries, and operator executes the necessary steps. If it encounters a challenge like a captcha or password field, it will pause and prompt you to take over ensuring you stay in control.

I mean, just call this out right now. These are the absolute worst things to do with operator, right? And I'll tell you why here in a minute, but we're not going to do any of these things that OpenAI suggests because it's a terrible use of your time. And I think it's a terrible use of their technology to use it how they've suggested both on their website and when they demoed it.

All right. Let's talk about limits because everyone always wants to know. All right. Well, if I'm going to pay $200 a month, can I just have like 80 instances of this thing going at once? Well, like I said, it will slow down, right? Just like a normal computer would. Each time you start a new operator chat, think of it like this.

Think that you're running an old computer from 10 years ago, right? You should probably only be doing a couple of things at once. You should probably have a couple of tabs open at once, but each new operator chat that you start, it is essentially starting a new virtual machine. However, keep this in mind right now, you have to have the tab or the window active for it to keep going. So I've been trying some like workarounds, right? So as an example, if I'm using Chrome or Edge,

you know, launching a new profile and continuing to work. So hopefully it'll actually work here, you know, because I'm in the same instance. I'm using Chrome right now. So we might only be able to do one thing at a time for that very reason. However, you know, there are some nice workarounds, but you do have to have it

active and open, right? If you listened, y'all, to my 2025 AI Roadmap series, I said virtual machines and second computers are going to be huge in 2025. And here we are a couple of weeks after that show debuted. And yeah, now you see why, right? Now I'm happy I have a stock

pile of extra computers because I can just, you know, launch maybe two operators, give them extremely detailed multi-step tasks and have them literally do my work for me. Right. But I have to wait. So presumably we'll see the same thing with Google's, uh, Mariner, uh,

which is essentially their computer using agent that will be hopefully rolling out in the coming weeks and months. I did talk about that a little bit with Google's Logan Kilpatrick on Friday, if you want to go back and listen to that conversation. But

Now's a great time if you have that extra machine to go ahead and do this because like operator with Google's Mariner, which will work as a Chrome extension, it needs an active window or an active tab because it's essentially using the instance of your browser to use a virtual machine. That's how operator works. Mariner works. It's literally using your browser, so you can't do anything else. All right. So

Now, if you're like me and you're a slight computer hoarder, this is where it pays off, right? Because you can go set it up and essentially have one computer just always doing your work over there in the corner. But you got to put the work into it and you have to know how this works and how it doesn't. All right.

So can operator handle multiple tasks at once? So yes, operator does allow you to run multiple tasks in parallel. However, for security reasons, operator places dynamic limits on the number of simultaneous tasks and open conversations you can have at any given time. And these limits may change. Yeah. So there's no like hard limit. It's not like, oh, you can run two things at once or three things at once. It's dynamic.

which is going to make a live demo kind of tricky because we might run into limits. You know, I tested everything last night. Everything was going well. But I mean, we'll see how it actually how it actually works. Right. All right. So.

Let's talk about how to actually use it. So you don't, it's not in the same chat GPT interface. So you can go to operator.chatgpt.com. Again, you have to be on that $200 pro plan. Otherwise this isn't going to work, or you can log into your normal chat GPT account and there will be an operator icon in the left-hand corner where you would normally see your GPTs. And then, like I said, it has to be an active window or tab. So what the heck should you use this for?

Right. Here's here's where I'm going to enjoy. I'm going to enjoy this slightly hot take Tuesday. Right. I actually might have a hot take Wednesday for y'all tomorrow if you want it. So number one on what types of tasks should you be giving to operator? Probably not what you would think.

Okay. Because first you have to know and understand open a eyes full tool set. So here's what I mean by that. You have to understand chat GPT tasks. All right. And please, please,

Please, y'all, go listen to my chat GPT tasks show. All right. It's funny. I actually had someone from OpenAI reach out after that show and they were like, you know, this was great. Like I learned so much from this listening to this, which I was like, I was like kind of shocked on. Right. So you need to go listen to that task show because I don't think people understand.

understand how powerful chat GPT tasks is. So that's episode 440. Go listen to it. So again, before using operator, you have to understand tasks and you have to understand task stacking. All right. So you can literally go back and reshare that show, put together a huge guide on task stacking. All right. Then you have to understand tasks.

Chat GPT, their new mode, O3 mini plus chat GPT search. All right. So a reasoning model that has access to the internet, because that can also change what you think you might want to use operator for. Right. So a lot of the things that you're thinking, oh, I'll use operator to go do A, B, and C, it's probably already available and you just didn't know how to use it.

So go listen to our 03 mini show as well. Right. And I'm not just saying this, like, you know, I don't get paid $20 every time you go listen to a podcast, I get paid nothing. All right. I'm doing this to save you time. All right. And to help you get the most, you and your company get the most out of generative AI. So go listen to episode four 56 on 03 mini high. And then

You have to understand deep research. All right, we covered that in episode 454. All right, so OpenAI's deep research is outstanding. That is their other agent. They released it, I believe, on the last day of January, so just about less than two weeks ago.

So you have to understand those kind of three or four kind of tools or modes within ChatGPT because a lot of the things that I see people, right? I go out and I read people's reviews or watch people's videos and I'm like, y'all are using this wrong. This is like the absolute worst thing to do because operator is slow.

It is slow. In many instances, it is slower than a human. So you have to keep that in mind on the type of agency you are handing over to an agent. Don't hand it over something that is actually going to take the agent longer. So what type of task should you give them? So like I said, don't give them anything OpenAI used in their demo. In their demo, OpenAI and on their blog posts, they lean very heavily, which I don't know why. Maybe because they're

Talking about these things, I don't know, helps you imagine a future where everyone has a Jarvis, right? So they're like trying to order, you know, like tickets to an NBA game and trying to order groceries, right? Don't do that. Don't do that. Just because you can, right? I think they're trying to perform transactions and they're trying to show everyone, oh, you can go buy things on the internet, right? And let operator do that. Number one,

It's way too time intensive. You are not going to win back your time doing that because unfortunately, even when you try to over prompt it, operator is still going to ask you a lot of questions. All right. An agent is not an agent.

if it has to ask you more questions, then, and if it takes more time than it would take for you to do it on its own. So yeah, in OpenAI's demos of, you know, reserving a table at a restaurant, ordering tickets, ordering groceries, those, in my opinion, those are terrible use cases.

because those are things that the human can probably do two to three times faster. And it's actually a quite frustrating experience. I think one of the reasons, right, about being honest, one of the reasons why they probably demoed that is OpenAI is using this as training data, right? And I get it.

We need all of that training data in order to build the next version of operators and the next agentic system. So I get it. I get why they're probably pushing those things. And sure, maybe someone might find, you know, it's a nice party trick, but I don't know. I don't want to sit there and answer, you know, four to nine questions just to reserve a table at a restaurant, right?

It doesn't make sense to me. I want to hand off operator as much of my day-to-day work as possible. Sit back, go warm up my coffee and go do something else, right? That's the point of having an agent. So you should be used. You should not be using the prepackaged prompt ideas. Do not use them. All right. What you should be doing is any basic research, reading, writing, summarizing, uh, data analysis task, um,

That cannot be done in chat, GPT, deep or chat, GPT or deep research, right? So you should be doing these knowledge work tasks that involve you going into multiple websites, multiple software services. That's what you should be focusing on.

All right. So like I said, reading and writing across different domains and services, that's number one, that's something a large language model is better at. It's faster at, right. It can, uh, you know, summarize and, and synthesize much better, much faster than any human. All right. So any knowledge work task, uh, connecting multiple services or any manual repetitive tasks that are time consuming and happen across multiple domains.

Let's look live. Are you guys ready? This could go horribly, if I'm being honest. Let's see how this works. And one of the main reasons is because I have to always have this active tab. So even when I'm trying to copy and paste some stuff over...

It might not work very well. All right. So live stream audience, if you could, please let me know when you can see my screen. All right. So right now I have operator open. I'm going to start on this right away. All right. And then I'm going to walk you through what's happening. Podcast audience. I always put the link.

to this show. So this is going to be a very visual process. I'm going to try to do my best to describe to you what's going on, but if you want to actually see it with your eyes, all right, we always leave the link to our website. On the website, we put the YouTube video, or you can go watch it on LinkedIn. All right. So I just pasted a prompt in, all right, and I'm going to, all right, let me, thanks, thanks live stream audience that you can see. All right. So what's happening

I'm going to go ahead and click this button here that says expand. Well, actually, I'm not. So first, I'm going to write down what you know what I have multiple screens here. Let's do this. Let's do this. All right. Hopefully, hopefully. Of course, it did this. I literally just signed in. I signed in to this.

to my Gmail account before this started. Tested it, it worked fine. So sometimes you have to enter in your credentials multiple times. So I was hoping I wouldn't have to do this and I was hoping that we could do this whole thing autonomously. All right, so give me a second. I'm logging into my, this is my personal Gmail. So please don't spam me. I guess you can if you want. All right, all right. So...

Now I am super, super zoomed in here and I can't zoom out. So give me a second. All right, there we go. So now the screen sharing should be back. Are you still running in circles trying to figure out how to actually grow your business with AI? Maybe your company has been tinkering with large language models for a year or more, but can't really get traction to find ROI on Gen AI. Hey, this is Jordan Wilson, host of this very podcast.

Companies like Adobe, Microsoft, and NVIDIA have partnered with us because they trust our expertise in educating the masses around generative AI to get ahead. And some of the most innovative companies in the country hire us to help with their AI strategy and to train hundreds of their employees on how to use Gen AI. So whether you're looking for chat GPT training for thousands,

or just need help building your front-end AI strategy, you can partner with us too, just like some of the biggest companies in the world do. Go to youreverydayai.com slash partner to get in contact with our team, or you can just click on the partner section of our website. We'll help you stop running in those AI circles and help get your team ahead and build a straight path to ROI on Gen AI. I told y'all, I am not always a fan of doing this.

live, even though I know y'all love doing these things live. So let's see if I can get this to work because of course it worked one shot when I demoed it last night.

All right, here we go. So here's what I told operator to do. So I copy and pasted this in. All right. And all I did so far is I had to log into my Gmail. I'll tell you why. So I said, step one, go to gemini.google.com and ask it to complete a very basic SWOT report for the everyday AI podcast by Jordan Wilson. Then hit enter into our live stream audience. You see it's working on its own right now. My hands are right here. I'm not typing this in.

All right. I said, step two, then go to Google slides and copy and paste the input and outputs from that Google Gemini prompt in response. And then I'm explaining this is after using it a little bit. I'm saying sometimes it may ask you to install an extension for copying and pasting. If so, allow it. If not go about your copying and pasting, use your best discretion on formatting. The Google slides docs should only be five pages long. Number one, title page,

Two, strength. So this is SWOT, right? So essentially a title page and then a page for SWOT, strength, weaknesses, opportunity, threats. Step three, export the Google Slides as a PDF doc.

Step four, log into my Gmail and then send that PDF reports to info at youreverydayai.com. Write a short subject line and a one sentence email summary. And then I'm saying, do not, and this part is important, y'all. I've been playing with operator a lot over the past like two weeks. So I'm saying, do not ask me for permission for anything.

Use your best judgment. Please complete this autonomously. If you run into any issues, try a second time. If your second attempt doesn't work, then try another route or get creative in accomplishing the goal. The only important thing for you to do is to finish all four steps without human input. Please complete this task autonomously. So yeah, you'll notice that I did

multiple times, remind operator like, yo, don't talk to me, right? I'm not here to be your friend, right? You have a job to do. Go do this autonomously. I gave you detailed directions. Take your time. Make sure you get this done correctly, all right?

So you'll see over here for my live stream audience, I'm kind of clicking through this and a couple of things to know. So you can see a kind of summarized chain of thought on what operator is doing. So remember, this is based off of GPT-4, but we almost get this O level, right, of the O series, the reasoning models. We almost get that kind of under the hood look.

of what it's doing. Also know at any time you can go back

And replay this if you want, right? And I would highly encourage you to do this, right? So even if you don't have the $200 a month pro plan right now, you need to access this. When this does come out to maybe the plus plan, I encourage you, you have to always look at this kind of summarized chain of thought. You have to see and understand what it's doing. So you get to that by clicking this expand button.

Okay, so otherwise you can't really follow along. So I click this expand browser window button. Again, I'm in the operator interface and you'll see right here it says one task in progress. I didn't want to do two simultaneous tasks. All right. So we can hopefully really walk and talk through. So.

You'll see also when I hover over my virtual screen here, it says take control. So at any point, if something is going wrong, I can click take control. Right now,

I don't need to. I had to log in even though literally right before I hit record, this was working fine. But there's always human in the loop, right? But in my prompting, I really pushed and requested operator to do this all on its own, right? There's no point in using an agent to do a task that would take you five minutes that, oh, working with...

uh operator takes me eight minutes that makes no sense right so you are gonna have to put a little bit of work into you know prompt engineering 101 all right you're gonna have to put in some work into learning all right so now

As an example, I'm looking down and I'm seeing what's happening here, right? I can see the actual step-by-step how this is thinking. So right now I can see it was struggling to scroll down on the page. So it's about halfway done with this task. So it was struggling to find the opportunity section of the SWOT report that I asked it to generate. So again, let's even back up. So we started in Operator.

And then I had operator log into Google Gemini, right? So unfortunately, operator right now can't use operator, right? But it can use a lot of other tools that you would log into, which is great. Some websites right now, and I would assume that as computer using agents become more and more prominent,

that they're going to figure out how to block these virtual machines, how to block this virtual traffic, right? At least for me, it was showing up as like a device in Iowa. I know I read back a couple of months ago that OpenAI and Microsoft and others were looking at data centers in Iowa. So I'm not sure if that's what it is or if it's always just going to dynamically show up in a new place. So you will probably have to do a lot of two-factor authentication if you are logging into sites that require your credentials, right?

But in my opinion, that's what you should be doing. So I wouldn't be, again, I wouldn't be uploading sensitive proprietary documents, anything like that. You know, right now, this is just my personal Gmail account, but I'm having it go in, open Google Gemini, all right, run a research task, right? This is something that I would normally be doing.

And you'll see it's already done. So right now it's, it completed the presentation. It looks like it's downloading it right now. And again, I'll walk everyone through this. I want to get the second prompt started.

But it's already downloaded the file, all right? So I asked it. I said, hey, operator, go out, use Google Gemini, then go create. So it's working between Google Gemini and Google Slides. It's copying and pasting all this information. It was even resizing text, right? Because it would enter a text box and it didn't fit. So it was resizing it all. And it's pretty impressive because it's doing this all with screenshots, all right?

uh let's see so it looks like it looks like it might have stopped there so yeah unfortunately it did not uh complete the entire task because the rest of the task let's see um let's see if i can just re-enter this and have it continue on again y'all like maybe i'll share the video uh but

It literally did this entire thing last night. But, you know, generative AI is generative. It's a roll of the dice. It's going to be a little bit different. So it looks like it didn't do step three and four, which was emailing this to myself. So now I just repasted that in there. So it's going into my Gmail account. It's clicking on compose.

All right. So now let's see. It's looks like it's finding out pretty quickly there. It entered my email, uh, the info at everyday AI. This is where it generally struggles is attaching files. So it essentially has this right here, uh, a file system. Uh, and I told it over time, I found out where operator kind of, uh,

shares its or keeps its files that it downloads because it's on a virtual machine. I'll probably have to fine tune those instructions a little bit because I know it's in that OAI, that open AI folder and a shared folder. So for whatever reason, I need to add a little bit more detailed instructions about where to find it because right now operator is struggling to remember. So it's in that share folder. So we'll see that kind of double clicks in there.

So yeah, for whatever reason, it is struggling right now to find files, but that's fine. All right. So I'm going to go ahead. I'm going to stop this task. So we'll give it, we'll give it a, maybe a, I don't know, a B, a B or a C on that one, but let's do something even more difficult, right? That makes sense. You know, if it fails at a task, that's a, you know,

Three out of 10, let's give it something that's extremely, even harder to do, right? That makes sense. All right, so now, live stream audience, you see this. I am, this is very long. This is very long, all right? I'm giving it a very, very difficult task. So this is something I do all the time, right? I'm not asking it to go order my pizza or go to a restaurant

You know, go find me tickets to the Warriors game, whatever. All right. So I'm telling it. Here's what I'm doing. And I'm also intentionally being a little vague. All right. So I said for this task, you will find a trending topic in generative AI and research potential Hot Take Tuesday topics for an everyday AI podcast.

So I'm saying, uh, before I get, give it its steps, I'm kind of walking it through what's happening in live stream audience. You can already see it's, it's on my website. It's searching, but I'm going to walk our podcast audience through how we got there. So I'm saying you will research a Google URL, identifying an interesting trend or story that will be a good podcast episode. Then you will use Google's Google Gemini's deep research tool to conduct more in-depth research on that topic. Also, you will make sure to look at the context of this chat. That is important. Y'all.

Right. Lights, lights, lights, gem, gem, gem. Right. Because what I'm going to do is I'm going to run this task probably a couple of times a week and I don't want it to keep suggesting the same thing over and over. So I'm telling it, yo, look back at the context of this chat. So don't suggest something to me you've already done.

All right. So then I'm saying step one, first, you will go to the everyday AI podcast episodes page. So I didn't give it the URL I wanted to see. So what it did is it went to Bing. It typed in everyday AI podcast. It went to the homepage. Then it went to the episode page. It did this on its own and it clicked. It clicked the search button. I wasn't looking at it.

closely because I was looking at my prompt here on the other screen. Let me just go through, kind of check my chain of thought a little bit. Let's see what it did. Yep. Okay. So then it clicked the search button and it searched for Hot Take Tuesday, right? So those are my Tuesday episodes where sometimes I bring in hot takes.

All right. So now, all right, it's working this time, y'all, with no hands. This is good. So then I'm saying you need to go look at all of my Hot Take Tuesday episodes so you understand the type of topics. Then I gave it essentially a Boolean search on Google, right? And this Boolean search, it essentially, it's a little...

But it essentially brings up AI news over the last 24 hours from a bunch of big companies. So there's it's, it's a very advanced Google search. So I copy and pasted that long URL string in there. All right.

And then I said, this shows you when you paste this into Google, this shows you some of the top AI news stories for the week. Step three, you will identify one trending topic that could make a good episode idea for everyday AI. Again, pay close attention to the types of Hot Take Tuesday episodes that we've already covered. Step four, you will research that topic.

This is what's happening on the screen now, and it's going to take a couple of minutes. You will research that topic using Google Gemini's deep research feature. All right. You will go to gemini.google.com, sign in with the account that is on the screen. It did that. I said, do not skip that part. So this time without me typing it in, it properly logged into my Google Gemini account.

I have a paid account. And then I said, Google Gemini's deep research is an AI tool that performs research. You will need to click the model selector dropdown in the upper left-hand corner and select 1.5 pro with deep research. You

You will write a prompt instructing that mode to research the Hot Take Tuesday topic that you selected and include any relevant information that is needed to properly research that topic for the Hot Take Tuesday show. And then I gave it an example. You should always be walking this through step by step because, again, this is a human process that would take me probably about –

20 or 30 minutes without distraction. All right. And you might be saying, okay, Jordan, it looks like it's already taken five to 10 minutes. Yes. Right. But I can let this run autonomously. And I do believe that there will be a way to schedule these as well in the near future.

All right. So now after that, I gave an example of the type of prompt that it should put in. I'm not going to read that because it's kind of long, but essentially I'm saying when you use Google deep research, you need to put in this type of prompt. So just like you would, uh, you know, give a large language model shots, right? A five shot prompt, a five shot is better than a no shot prompt. I'm giving it some examples of what's good and what's bad when it's using deep research.

All right. And then I'm saying, please be please be exhaustive in your search, making sure to tackle this from every angle. And then I'm saying step five, Google deep research will give you a content plan and you will click select the blue button that says start research. Right. So there's actually multiple steps inside Google deep research. So it first needed to look at my example of a prompt.

Apply that to the, essentially the Boolean research that it went off and did on its own, right? So are you looking at the number of steps here, y'all? And essentially the agency that I'm giving this agent, right? I'm saying, yo, go look at my hot take Tuesday. Essentially think like me, see what I cover. Then go do all my research. I believe it went through about 40 to 50 search results using that Boolean strategy.

essentially search URL that I shared with it. So it's looking at all these different news stories, trying to identify trends based on things that I already cover. All right, this is great. Then on top of that, without, you know, my hands have been in the air the whole time, more or less, right?

Um, then without any other instruction, it is going straight into Google Gemini's deep research. I gave it an example of how to use it. Otherwise it's going to stink. It had to verify, right? That's the other thing. Uh, Google deep research essentially starts and puts this plan together for you. Um, and then it had to click to verify it. And then I told it, I think I told it, or maybe I told it in a different one.

Okay. So I, I didn't even, okay. I did. Okay. So I did say, uh, step seven, you will have to wait two to 10 minutes for this to finish. Right. And you'll see on my screen right now, it keeps, uh, operator essentially keeps taking a screenshot.

And it keeps saying awaiting completion of research analysis, right? Waiting for research analysis completion. But I told it, I said, you will have to wait two to 10 minutes for it to finish. There is a small icon that looks like two windows and a purple-ish status indicator.

All right. You will need to be patient for this to finish. And then I said, eventually on the left-hand side, it will say something like I've completed your research. Then on the upper right-hand portion of the screen, there will be a light blue button that says open in docs. Please click that button. So you'll see right now in Google deep research, it's research 76 websites already, right?

I hope in the future, right, that you will be able to use, which I'm sure you'll be able to, that you'll be able to use OpenAI's operator with tasks, with OpenAI deep research. But right now you can't, right? But this is the literal process that I always do. So you'll see right now, live stream audience, it finished live.

It finished completing the document. So it looks like it's trying to open the document and for whatever reason, okay, there we go. It had to try it a couple of times, but it put together a,

It put together this document. So what it was, what it decided the Hot Take Tuesday to be was kind of the ethical, the impact of AI on pricing and its ethical implications, which is actually pretty, pretty fascinating, right? Because when intelligence becomes cheaper and cheaper, what happens to humans and the ethics behind that, right? So pretty, pretty cool topic there that it decided to put together.

All right. So now I told it, uh, I said, please save this document as a PDF. So it looks like it saved it as a PDF. So that's good. Then I also said before exiting this Google doc, we want to copy all the text. You can do that by clicking and dragging, or just by pressing command a or control a then command a or control C.

Then I told it, please go to notebook LM, right? If it does not log you in, click on the try notebook, LL button. If it does log you in, click on the blue, create new button in the upper left-hand side of the screen, which is what it's doing now. Then I said, click on add source. It's literally doing this in real time. And then I said, paste in all that information. Bam. It just did that. Uh,

Uh, let's see if it does the, uh, the next step here. This is pretty, pretty impressive. Good. It just clicked generate. So it's generating an audio overview for me at the same time. Right. Are you guys seeing what's happening here? This is, this is what I do. This is what I do all the time. Right. I look on my website. I'm like, all right, I got to plan a show for this week. Let me see what I've covered recently. Right. I might go look at stats from our podcast as well, which I could do this. Uh, right. I could do this. Um,

All right. Let's see. It looks like I was hoping it would, it would finish it all. Let's see if it's, if it's going to. But this is what I would do. I would go look on my website. I would go do a bunch of research on, you know, Google or, you know, deep, deep research, honestly, from OpenAI, but I can't do that right now. And then I would go in, I would go into deep research.

I would take that topic, have it do a bunch of research. I would copy and paste that, put it into notebook LM, generate an audio overview. This is literally what I would do. All right. And now hopefully it's, wouldn't this be weird? Uh, let's see it. Oh, it said it paused while I was away. Uh,

because I wasn't clicked on there. So I'm not going to count that as anything because I was just clicked on my other window. All right. So isn't this, isn't this wild? So now it's going to, let's see if it can actually finish this task because the first time it failed a little bit. All right. So my last parts of the task of the task are to go to my Gmail, send this to info at your everyday AI.com. Put a subject line in a brief. Love this.

Oh, look at that. It actually did it. It did it correctly on the second time there. It found the attachment right away. Bam. Look at that. It did the entire thing, right? It did the entire thing. All right. So now just to hopefully prove to everyone, I'm going to go ahead and open my email account. All right. There's a reason. There's a reason I did this on my old...

camera here. I'm sure no one really noticed, but I have to have my phone, my phone available here for all the two FAs because now my computer, because I was essentially using a browser from a, probably another state here in the US, it's getting a little confused and I'm having to re-log into everything, which is a little annoying, but that's fine. So, all right, let's see. Let's go ahead and share my screen here, y'all.

Look at this email, email from myself. Look at this. Here's the email y'all. Hello. Please find the attached PDF document detailing the impact of

I love that I just best regarded myself live here on the Everyday AI Show. Then I can click. Here is the deep research section.

So look at this. There we go. And then I would probably take it one step further and have it also download the MP3 from from notebook LM and attach that as well. Right. But I wanted to show you an example of this is what I actually do. Right. This task would have probably taken me, like I said, 20 minutes. I should have timed it. I can go back and I can go back and look.

And you know what? We're going to go share. We're going to go share that screen anyways. So we can go back and look and exactly see, see exactly what happened. All right. So if I go up here, all right. So it says worked for 11 minutes. All right, there we go. Worked for 11 minutes. So this process by myself would probably, like I said, probably takes me about 20 minutes. So you might be thinking, okay, Jordan, well, a two for one trade-off. What's the big deal, right? Number one,

This is something I can go be doing other things, right? I did get this working last night when I'm not doing a live stream where I was doing my own work just in another Chrome or Edge profile and it was working perfectly, right? So it just did my work at a very high level, right? And this was essentially my first time doing this and

as I always tell you, anyone that's taken our, you know, free prime prop polish course. And I know it's been like two months since we did that. I'm sorry. We're going to have new dates coming up. I'm getting a ton of emails on that. Essentially our, uh, you know, hosting provider changed their plan. So we're, we're moving it. We're rebuilding it literally from scratch. It's,

I think it is going to be the best basic chat GPT course on the internet. I think it's going to be better than courses that cost, you know, a thousand dollars. It's all going to be for free. Uh, so even if you've taken our PPP course, like five times, you're going to want to take this new updated one. FYI. Um,

So anyways, this is a task that I would do in getting a two to one. And anyways, what I was getting back to, I'm going to go back and I'm going to look, I'm going to look at this kind of chain of thought. I'm going to see what worked well and, and what didn't. Right. All right. So doing this one time doesn't, doesn't mean a whole lot. Right. That's just to get the process down. So I want you to think,

What are those manual time-consuming tasks that you do across different domains, across different websites that you maybe have to be logged into? I just gave you an example of a task that I do fairly often.

Right. I'm going back. I'm looking at my old episodes. I'm doing some research on Google. I'm using my brain. I'm thinking right. But now I can go back and look at this kind of chain of thought on operator. See, see what it, see what I like that it did because I can literally go back and watch the recording, which is great. And I can see step by step. So then I can kind of save my set of instructions and,

change them, improve them, right? So maybe that 11 minutes will get down to eight minutes, but not only that, but then I can look at increasing the quality of the output. So now I can not only do it in half the time, but I could do it even better, right? I can maybe make that task. Oh, this is something that would now take me 30 or 40 minutes. And maybe I can still do it in 10 minutes while I'm doing something else.

And then think, think of these three, five, 10 ongoing little projects or tasks that you do all the time. And maybe right now there's no other way to automate them, right? Maybe right now you're just automating the pieces, but you can't automate the whole.

This is where operator changes that, right? So yes, some of these things were already, you know, you could already do by using something maybe like Zapier, by using some APIs or make.com or something like that, right? And speaking of,

We have to talk about APIs, right? This is how like 1% of the internet talks to each other, right? But what about for the other 99%? This is where CUA or computer use agents comes into play. You also have to, you know, tip your hat to the anthropic team that came out with their computer using agent. I think it was back in October. It just wasn't usable.

You had to download Docker, which is an extremely compute-intensive program on your desktop. You had to go into a GitHub repo, and it timed out every five seconds. You just saw it did an 11-minute task all on its own. I didn't limit out or anything like that. Granted, I am on that $200 a month pro plan.

All right. I do want to show you a couple other things on the operator interface. Okay. So like I said, this does look kind of like a chat GPT. All right. A couple of things. I wish you could rename, rename these kind of operator tasks. So you can't right now. You can only delete them. That's one thing to keep in mind. All right. Another thing.

is you're always going to have your active tasks. So I have run up to three at the same time. I don't know if that actually slowed it down or not, but keep in mind there's limits that are dynamic, so you don't know what that actually means. Let's go into the settings because this is kind of important. So you can go in here to save tasks. So I'm going to go into the one that we just did, and then I'm going to go click save tasks.

All right. It's going to auto generate a title, the detailed instructions. So in this case, I would not use these detailed instructions. It's the same, uh, the same, uh, kind of piece of advice that I gave you guys for chat GPT tasks. Never let chat GPT, uh,

save instructions on its own. It's not going to work. So it really just abbreviated those instructions. So I'm going to paste all of these in. So it has it. And then, so it says title research, trending AI topic, the detailed instructions. I copy and pasted those in manually. And then it says websites,

So it's going to use gmail.com. It's going to use your everyday AI.com. So if it ever starts going in the wrong direction, you can put that there, gemini.google.com. And then we had notebook LM. So now if I am running into issues, I can essentially save this.

Save this as a task first. Let's see, it doesn't look like it saved it. Let me just double check that there. I'm so zoomed in on my interface here.

I think I just had to zoom out. There we go. All right. So then, yeah, I can go here, type in the URLs, whatever. I'm just showing you all an example. Oh, here's the downside. So it looks like this is why I didn't save. The instructions cannot exceed a thousand characters, which that stinks. So let's just show you what this looks like. So this wouldn't work now. All right. Well, let's just go. Let's just click save tasks. Sorry, y'all. All right. I'm going to save that.

So now that is going to show up in my saved task right there. So then at any time I can go in and modify that as well. All right. A couple other things. And these are things I don't even think you should pay much attention to if I'm being honest, right? So when you do go to the homepage here, so now I have my saved task and I can click that. I can edit it or I can click it and it will launch it right there.

But don't pay attention. These are the things that OpenAI demo. Don't pay attention to these. These dining and events. So these are essentially prepackaged prompts. And it does look like OpenAI partnered with some of these websites and companies to provide a more seamless experience. Like I said, I would never use Operator for any of these tasks because it requires too much human in the loop.

I like when I'm using an agent, I want to save time. I don't want to sit there and just be like, oh, cool. And then like answer a question every 45 seconds. That's a waste of time. Right. So you can go through here and, you know, use open table to reserve a table or stub hub to do tickets. You know, Uber eats Instacart. Right. All these things. Thumbtack Uber. Like I don't know.

I'm not going to use operator to do in Uber. I'm going to use my Uber app, right? But a couple of other things to keep in mind, you can go in here into your websites. So for all of these, you can give them custom instructions. So for booking.com, I can go in and set instructions. I could say, you know, like I like, you know, modern interiors,

and outdoor spaces, right? So then if I'm using booking.com or whatever, it will take those preferences into mind. So I wish, so you can do that for all of those websites that they work with, as well as news. So these are all the news organizations that OpenAI has partnered with. So I can go to, you know, the Associated Press, I can click edit and, you know, type in custom instructions for the Associated Press as one example.

So I hope and wish that in the future, you'll be able to add your own websites, uh, that you'll be able to store, uh, your credentials for all of those, uh, right. That would be extremely helpful. All right, y'all.

That was a lot. So I think there's a couple of questions. I know that this is already an extremely long episode. Angie just said, holy shh. All right. Sandra said she was blown away. All right. That's good. So this was helpful. All right. That's good. So yeah, even though this was a little bit of a longer episode,

of a longer process here, y'all. Thank you. So, all right, I see a couple of questions. I'm going to try to answer some of these as quickly as possible. All right, just scrolling through. Let's look at some questions.

Douglas, have you checked out any open source operator solutions? Yes. So there's browser use. There's a couple of other ones that have become extremely popular. I've done a couple tests, but I'm using operator more. Right. The reason why there's yes, there's other great kind of open source ask and fully open source projects that do this. The reason I'm not doing them is because you have to think of the future. Right.

Right? The future is operator is probably within hopefully weeks or months going to be able to work with chat GPT tasks. It's going to be able to work with open research. So in my mind, it is not worth, um, you know,

Like, I think you have to choose your ecosystem, right? And I'm choosing, right, for at least when I'm on my Mac, right, I have my Windows computer, my Windows Copilot plus PC, I still got to get set up and using. But for the most part, I'm using in my day to day, I'm using ChatGPT, right?

I have free plans, plus plans, team plans, pro plans, enterprise plans, because we train companies, obviously, right? This is my business operating system. So I'm not, even though there are, you know, some other better, or I won't say better, there's some alternatives that may be cheaper, but I'm not,

I'm working for the future here, Douglas. I'm not working for today, right? Because in the coming, uh, probably weeks, months operator is probably going to start working with everything else. So, uh, I am currently building skills and using operator that are going to pay off as number one operator gets better. And number two, it starts to work with all the other products and tools in open AI ecosystem. Uh,

Uh, woozy. What's the coolest use case you've seen anyone do with it? Jordan. Uh, what's up woozy. Hey, I'm sorry about your chiefs, buddy. I'm sorry. Uh, caught, caught a beating there. Um, all right. So what's the coolest use case you've seen? I mean, it's limited.

right? It's limited because right now the virtual machines that this use, they don't have a lot of computing power. So I don't know if I'm being honest, some of the coolest stuff I is what I showed you guys, right? Using deep research, uh, using other large language models, I think is great. I think it would be cool, uh,

when it can consistently handle using something, uh, like a cursor or something like GitHub copilot, uh, right. But right now it's not there because you still have to have kind of that quote unquote virtual machine compute and it doesn't have. So anytime you try to do anything, uh, that's a little too, um, you know, power intensive, you're going to get a warning.

Sandra, one of your prompt classes resuming hopefully in March. Pedro, how could you prompt the model to be iterative with other AI models? So yeah, I kind of just showed you an example of that, right? It was using Gemini. So, and I did give it an example of a prompt to do the deep research. So you have to give it examples, you know, in your instructions essentially.

Another question, Pedro, would you use this to dive deep into X using Grok to search for news and hot topics and process the data as you did? Maybe. I personally think Grok stinks. The only thing that I think Grok is decent at is

is searching X or Twitter. And in many instances for what I want to use it for, it doesn't do well. So a lot of times I'll say like, okay, today's, you know, February, let's say today's February 11th, right? I'll say, Hey, give me the top AI news for February 11th. And it'll bring in things from two weeks ago. Right? So I don't think crock is a good model. I wouldn't recommend businesses use it. So I'm not using operator to, uh, you know, do anything. Uh, big bogey says, looks like it needs some prove it hot take. How do you rate it? Um,

it's an A, right? Especially after using some of these open source tools and Claude's computer use, it's an A, right? A lot of times what I find is once you go through and you improve, you run something once, you look step-by-step and see what it does, and then you improve your instructions. In most cases, it's going to do it extremely well. I mean, in my use case, I had it

Query something, click on my website, click on the search bar, search for something, go back, use the pagination or pagination, right? Look at multiple pages of my website, understand trends, then go use Boolean search, research something, find what it thought was helpful, go in, then in deep research, which requires multiple steps, right? Like you saw what it did. That is amazing. And maybe I'm just blown away because these are the

what I feel are mundane, repetitive tasks that I do over and over. And now I can just be like, yo, operator, you go do this. And then it's going to get better at it than me. Because guess what?

It is using the GBT4 model. So it will be able to summarize, synthesize, and understand information better than I can, period. Right? So how do I rate it? A, right? If I look at this in six months, because it's probably going to improve, I will probably look back at it and be like, yo, that was a D. But right now, it's extremely exciting.

Cecilia, how are your passwords protected when you have the agent log into your accounts? So that's a good question, Cecilia. I read that last night. I thought I took a screenshot of it and put it in my presentation. I didn't. So I'll make sure to put that in the newsletter. Pedro, should companies set agent accounts? Yeah. I mean, companies need to be using agents, period. Yeah.

Marie says, I see it can save the task. Does it also save the sidebar commentary? That is saved by default. So you don't have to click save task to save that sidebar commentary. So I can go through at any time, anything that I've run in operator, I can literally go and rewatch the entire process with the commentary. So you just have to click that expand window and I can go just like you can kind of see that chain of thought.

I can see the entire step-by-step process in there. All right. Sandra says, can it use Canva? I don't know. Should we find out? Should we find out here? Well, actually, no, that's going to take too long. I'm going to have to 2FA it. But I believe, yes, it can from what I remember in my research, Sandra. But it's not going to work very well.

For anything you want it to do that's extremely visual, it's not going to work very well because essentially what it does, even to click in and type things in, it takes a screenshot. So if you were like, oh, go update this template or create a design, that's not really what it's for. Right?

Right now, at least maybe in the future, it will do a good job at that. But you saw it put together a very, I mean, albeit plain, it put together a PDF presentation for me. It resized the font. You know, it's not going to win any design awards, but it at least went to Google slides and copy and pasted all that information over there that it did for the SWOT analysis.

All right. Doug was asking, does the refine Q principles work here? Yes, it does. Your basic prompt engineering basics are always going to work. It's always going to improve it. You always need to iterate on the result. Don't run it once and say, oh, this is the best it's going to be. No, run it once, watch it, right? It's very tempting to just let it run and then go do something else. But again, think of that task that you do every single day that takes you 30 minutes, takes you two hours.

It might take you way more than that to, you know, automate this and to make it, you know, a solid operator workflow. But think, if then you can get that two-hour task to you don't have to do it,

That's amazing, but you're going to have to reiterate. So yes, the refined Q approach that we teach in our free prime prompt polish PVP course does work fairly well. And yes, basic prompt engineering, uh, you know, works well. Give it examples, tell it what's good and what's bad, right? Provide feedback.

uh you know improve your set of instructions each time rerun it uh tweak it right uh you need to be doing these things it's not you know uh agentic systems are not one shot they require human in the loop they require constant improvement constant refinement because they're going to get better and better as we go all right looks like i tackled all the questions so

I hope this was helpful, but let me just recap it. Is OpenAI's best AI agent operator? Yes, it is. Is it the one I'm going to use the most? Probably not, right? If I'm being honest, I'm using deep research a ton. I'm using tasks a ton because they're running, they're scheduled, they're running autonomously. But I do think operator is the best because like I started the show out with,

Right. I think a lot of people are seeing these individual, these fragmented use cases of AI. Right. But they're like, I still have to take these 20 pieces and put them together. Right. So a lot of people say, OK, it's not just doing my work yet. I thought that's what, you know, the future of AI and large language models. It was just going to do our work. Well, here we are, you know.

Going from the reasoner's step to the agent. We're there, right? I just showed you that is a task that I do over and over and over and over again. I just trained live here on the show. I just trained operator to do it for me. And I'm going to go in and I'm going to improve it, right? I'm going to have them send me that, you know, notebook LM deep dive or maybe send me a link to it, right? But now I can do better.

Right. I can do better instead of maybe looking at one of those reports, I can have a do three and then I can sit, I can read the report. I can listen to the deep dive and I can use more of my brain, more of my creative ability, more of my kind of strategic decision-making. Right. I can leave some of those mundane, repetitive manual tasks that up until operator, I could not fully automate, but now I can't. So that's why I do think I'm not saying I don't say these things lightly, but

This is a revolutionary step. This is a giant leap for the future of AI because the future of AI, like we've been saying for a long time, it's agentic, right? It is working in a multi-agent environment, giving agency decision-making passwords, right? Giving everything to an AI system, keeping the human in the loop, but then changing what we as humans work on.

All right. I hope this was helpful. Y'all. If so, if you want to put this in practice, I'm going to send you an example of exactly what I did. I will send you my instructions. So just go click repost. If this was helpful, if you're listening on LinkedIn or Twitter, just click that repost button. You can tag me in the post and, you know, to make sure I'll send this to you. Uh, you know, also for anyone that does repost this, I'm putting this out there.

I don't know what we charge anymore for like a 90 minute consult. I think it's, I don't know, like 350 or $400, something like that, right? Anyone that goes and shares this on LinkedIn, I'm going to enter you into a little giveaway. I'm going to announce it in the newsletter probably next week. So then that way our podcast audience, you all have time to go click the LinkedIn show for this. Go click repost, right? So-

I don't know whether there's two people or 50 people that reshare this. I'm going to put all your names in a digital hat. I'm going to draw one and then give you all, whoever does win this, a 90-minute consult. All right. So whether you want me to help walk your team through Operator, whether you have questions about ChatGPT, whatever it may be, you get 90 minutes. All right. I'm not going to put together anything for you. You essentially just get my time.

right? Talk to me. I'll answer questions, whatever it is you need. I'll do that. So make sure to share and repost this if this was helpful. Also go make sure you check out that AI predictions and roadmap series. Thank you for tuning in. I know this was a long one. I hope it was helpful. I hope I see you back tomorrow and every day for more everyday AI. Thanks y'all.

And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going. For a little more AI magic, visit youreverydayai.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

EP 459: OpenAI’s Best AI Agent? The correct way to use ChatGPT’s operator agent 01:08:38 Share

Everyday AI Podcast – An AI and ChatGPT Podcast

Deep Dive

Shownotes Transcript

EP 459: OpenAI’s Best AI Agent? The correct way to use ChatGPT’s operator agent