We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
People
A
Alondra Nelson
D
David Sachs
O
OpenAI
播音员
主持著名true crime播客《Crime Junkie》的播音员和创始人。
消息来源
特朗普
美国企业家、政治人物及媒体名人,曾任第45任和第47任美国总统。
特朗普政府
Topics
David Sachs: 我宣布政府的政策是让美国成为人工智能领域的全球领导者,主导世界人工智能发展。 特朗普政府:新的行政命令旨在推动人工智能发展,并带有文化战争色彩。其总体政策方向是保持和增强美国在全球人工智能领域的优势,以促进人类繁荣、经济竞争力和国家安全。 播音员:特朗普政府希望加速人工智能发展,但还不确定需要采取哪些步骤。特朗普政府的目标是成为全球人工智能领域的领导者。特朗普政府的政策可能损害美国公民的权利和安全,并可能导致人工智能领域的不公平竞争。 Alondra Nelson: 特朗普政府的政策可能损害美国公民的权利和安全,并可能导致人工智能领域的不公平竞争。政府机构将被要求审查那些已经帮助人们的计划,并可能取消这些计划。 特朗普:政府将快速批准在美国建设发电厂,以满足人工智能数据中心对能源的需求,并取消了限制人工智能行业的能源目标。政府将为人工智能数据中心建设提供充足的能源,并取消气候目标。 OpenAI:Operator是一个可以使用浏览器执行任务的智能代理,擅长处理重复性的浏览器任务,例如填写表格、订购杂货和创建模因。Operator使用云端实例中的虚拟浏览器窗口工作,用户可以全程监控或在后台运行。Operator遇到挑战或错误时可以自我纠正,并在需要帮助时将控制权交还给用户。Operator使用经过微调的GPT-4.0版本驱动,并且需要用户输入信用卡信息等敏感信息时会将控制权交还给用户。 消息来源:Project Stargate项目的细节仍在制定中,资金尚未完全到位。

Deep Dive

Chapters
President Trump issued a new executive order on AI, aiming to make America the global leader in AI. The order focuses on accelerating AI development and removing perceived obstacles, leading to concerns about potential impacts on safety and fairness.
  • Rescission of Biden's AI executive order
  • Focus on AI acceleration and global dominance
  • Concerns about deregulation and potential risks

Shownotes Transcript

Translations:
中文

Today on the AI Daily Brief, OpenAI launches its agent operator, and before that in the headlines, the latest on President Trump's AI executive order. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. ♪

Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. Today is one of those days where we basically have two main episodes crammed together as one. Later on in the main episode, I'll be talking about OpenAI's operator. But for the headlines, we're talking about a set of Trump policies, starting off with a new executive order on artificial intelligence.

Going back a few days, on Monday, the Biden AI executive order was one of the many rescinded by the incoming administration. To be fair, the substantive parts of that 2023 order had largely already played out, mostly to do with government departments filing reports. The major ongoing policy was a mandatory testing and disclosure regime conducted through the AI Safety Institute, with major labs potentially continuing with this on a voluntary basis.

Anthropic CEO Dario Amadei even commented earlier in the week that the repeal was, quote, not a big deal. The big question then was what would take its place. We didn't have to wait very long to find out, with Trump outlining his AI agenda on Thursday. AI czar David Sachs explained the order to the president in the Oval Office, stating, "...we're announcing the administration's policy to make America the world capital in artificial intelligence and dominate and lead the world in AI."

Overall, the new executive order is primarily a vibe shift towards AI acceleration with just a little touch of the culture war for good measure. It says,

That, of course, is the culture war part, as you can see from media who picked up on the free from ideological bias piece. Still, overall, the order sets the new overarching policy direction for the U.S., quote, to sustain and enhance America's global AI dominance in order to promote human flourishing, economic competitiveness, and national security.

Substantively, the order directs the heads of various White House advisors to submit an action plan to achieve this policy within 180 days. And so in this way, while the substance of it might be different and while the action plan that they're looking for might be different, the function of the EO isn't all that dissimilar to what Biden put out in that it's really more of a first step towards getting the various White House bodies in alignment around a set of new policies.

Sachs is, of course, playing a leading role in this process alongside the Advisor for Science and Technology and the Advisor for National Security. Input is also required from the Advisors for Domestic Policy and the Office of Management and Budget. The order also directs Sachs to survey all agencies for any action taken as a result of the Biden executive order. He is required to determine whether they are, quote, inconsistent with or present obstacles to the new policy directive. Within 60 days, the agencies are required to halt any initiatives deemed to be a problem.

And that's just about it. A brief one-page document compared to the sprawling 111-page Biden EO.

I think most people's sense reviewing this is that the Trump administration knows that it wants to accelerate AI development, but isn't sure yet what steps it needs to take to do that. The EO is basically a planting of the flag that says an important first step is removing the Biden guardrails as minimal as they were. As the companion fact sheet claimed, the Biden AI executive order established unnecessarily burdensome requirements for companies developing and deploying AI that would stifle private sector innovation and threaten American technological leadership.

Again, appearing on Fox News, Sachs said that the core point was to make the U.S. the global leader in AI.

Unsurprisingly, there are plenty of folks who are concerned about what will come next. Alondra Nelson, the former acting director of the White House Office of Science and Technology Policy under Biden, noted that agencies would be tasked with reviewing initiatives, quote, that are already helping people with an implicit intent to unwind them. She continued, in 60 days, we'll know which Americans' rights and safety the Trump administration believes deserves to be protected in the age of AI, and if there will be a level playing field for every technologist, developer, and innovator, or just the tech billionaires.

Still on the flip side, when it comes to industry and the accelerationists, the attitude might be summed up by based Beth Jezos, who writes unfathomable levels of EACC victory.

Now, maybe a more substantive policy, which came through a virtual appearance at Davos, was President Trump announcing plans to accelerate energy policy for AI data centers. He said, and I am clipping, editing, and paraphrasing because this is President Trump we're talking about here, "...we're going to give rapid approvals to build electric generating plants in the United States. We need double the energy we currently have in the U.S. for AI to really be as big as we want it. I'm going to give emergency declarations so they can start building them almost immediately."

The national energy emergency was declared on day one of the presidency and directed government departments to use any tools they have at their disposal to expedite construction. The new part of the policy is that the administration has removed any climate targets that were binding the AI industry. For Trump, this seems to mean turning back to coal power. He said there are some companies in the U.S. that have coal sitting right by the plant so that if there's an emergency, they can go back to that. Now, of course, this doesn't necessarily mean big tech companies are all of a sudden going to start building a ton of coal plants.

Their climate goals are as much about internal pressure from employees and leadership, as well as public perception, as they are about government policy. Instead, what many anticipate is the construction of gas-powered turbines, which can be built quickly and relatively cheaply, as well as the red tape around nuclear facilities being slashed in order to ensure that new projects don't get stuck in the regulatory quagmire we've seen over the past few years.

The other pillar of the policy is ensuring new data centers are able to build exclusive on-premise power stations. Power utilities have lobbied against co-location in the past, warning it could lead to supply shortages. More realistically, however, co-location tends to just cut out these middlemen and reduces the need to wait for new infrastructure to be built. The optimistic take is that with the average wait time for connection to the grid ballooning out to multiple years, this policy change could speed up the deployment of new data centers significantly.

Lastly, an update on Project Stargate. The participants are claiming that they have the money despite what Elon Musk says. Tuesday's announcement of the $500 billion Project Stargate shook up the AI industry, implying an infrastructure build-out even more significant than the Manhattan Project.

Not everyone was convinced, with Elon Musk saying that they didn't actually have the money. According to the information, though, they do actually have the money, or at least enough to get started. Their report said that SoftBank and OpenAI have each committed $19 billion to the joint venture, although it's not exactly clear where OpenAI is getting its $19 billion, and that Oracle and Abu Dhabi-backed fund MGX are kicking in a further $7 billion apiece. Still a few pennies short of the $100 billion price tag for the first year of the project, but they're probably good for it. At least that's what President Trump thinks.

When asked about Elon's claims, Trump said, I don't know. They're putting up the money the government isn't. They're very rich people. I hope they do. And then he pointed out that Elon just hates one of those people. And he understands because he hates people, too.

There you have it. In the meantime, more details are emerging about the scope of the project. A source speaking with the Financial Times said that Stargate wouldn't rent out its compute, commenting, the intent is not to become a data center provider for the world. It's for open AI. Another source said that details are still being worked out, stating they haven't figured out the structure, they haven't figured out the financing, they don't have the money committed. And yet the first data center is under construction in Abilene, Texas. Sam Altman posted a video of the sprawling site, commenting, big, beautiful buildings.

With that, though, we will conclude the headlines. Next up, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in.

Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company.

Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vantage to manage risk and improve security in real time.

For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents by industry, horizontal agent platforms.

agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.

That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.

If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI Quarterly Pulse Survey.

Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications.

For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at kpmg.com slash US.

Welcome back to the AI Daily Brief. Yesterday, I had a classic thing happen. This show is a daily show, right? Six out of the seven days of the week, there is an AI Daily Brief talking to you about the latest AI news and discourse. And you would think that daily is a frequent enough cadence to actually capture and be up to date with all the news.

Alas, sometimes even that isn't enough. And yesterday we had one of those situations where the headlines part of the episode talked about how it appeared that Operator would be coming this week. And between the time that I finished recording and when it was actually published, Operator had come out. I had a feeling as I was recording that that was going to happen. But in any case, that means that today we get to actually look at Operator, which is, of course, OpenAI's first true or at least advertised to be true agent project.

They call it an agent that can use its own browser to perform tasks for you. So let's find out what it is, and then we're going to talk through seven ways that people are using it already. Operator has been long in the making. Indeed, even as recently as a couple of weeks ago, there were news articles coming out that said that we're exploring things like why OpenAI hadn't released an agent yet. Their announcement post describes Operator as an agent that can go to the web to perform tasks for you. Interestingly, it uses its own browser.

And with that browser, it can look at a webpage, interact with it through typing, clicking, or scrolling. OpenAI is to some extent planting a flag here around what an agent is, referring to them as AIs capable of doing work for you independently. You give it a task and it will execute. They suggest that this research preview version of Operator is good at repetitive browser tasks, such as filling out forms, ordering groceries, and creating memes.

Now in terms of how it actually works, there is some similarity to the way that Anthropic's computer use mode is designed. The agent takes constant screenshots to see what it's doing in the web browser and can take control using the mouse and keyboard. Unlike Anthropic though, OpenAI has implemented this as a fully remote setup. After receiving instructions, Operator opens its own virtual browser window in a cloud instance. You can watch it carry out its task or you can click away and get on with other work while Operator works in the background.

Users retain full control of their computer with Operator running in its own fully contained browser. This of course limits the specific things that it can do, but it also makes it more usable at the same time. OpenAI has worked with specific major websites like StubHub, DoorDash, and OpenTable to try to improve and smooth out the integration, but theoretically, Operator can access any website that it needs to carry out its task.

There is a lot of human in the loop here as well. OpenAI writes, If operator encounters challenges or makes mistakes, it can leverage its reasoning capabilities to self-correct. When it gets stuck and needs assistance, it simply hands control back to the user, ensuring a smooth and collaborative experience.

Indeed, in addition to helping Operator deal with certain types of issues, taking over is also required to finalize certain tasks. For example, this version of Operator does not have access to credit card details, so if that's part of completing the task, it hands the system back over to the user to complete that particular step. Operator also asks for feedback at critical moments within its tasks. Under the hood, OpenAI has fine-tuned a version of GPT-4.0 to Drive Operator, which they're calling their Computer Using Agent, or CUA.

As far as benchmarks go, Kua achieved an 87% success rate on WebVoyager, which is a live website navigation test, and a 58.1% success rate on WebArena, which simulates e-commerce and content management situations. Much better than VanillaGBT4O, but certainly not necessarily the level of reliability we'd want before these types of experiences become endemic.

Speaking of which, as VentureBeat points out, TikTok parent ByteDance also launched its own AI agent for controlling web browsers yesterday called UITARS. They write it's totally open source and boasts similarly impressive benchmark performance, which makes them wonder if people will be willing to pay for Chatshippity Pro's $200 a month, which is the only way that you can get access to Operator at the moment. As has been the custom with OpenAI releases lately, the feature is only available to US Pro users, with Sam Altman saying that Europe will unfortunately take a while.

So let's talk now about some of the ways that people are actually using Operator. Keep in mind, these are all very nascent, first-test kind of use cases, and it always inevitably takes some time to really figure out the best ways to use any new capabilities like Operator offers. Certainly when it comes to how OpenAI was positioning this, it's a lot of the very basic assistant tasks that I've often on the show said I don't think are going to be the real drivers of agent behavior when it comes to consumers.

Ultimately, whether I'm right and these aren't the long-term drivers of agentic behavior, or I'm wrong and this is exactly what people end up wanting to use agents for, it's clear that they're valuable as a test case and as a way to start training and giving agents capabilities. The first use case that many people shared was some version of grocery shopping. This was one of the examples, in fact, that the OpenAI team used to demonstrate operators' capabilities. They gave it a shopping list written down on a piece of paper, says, can you buy these for me, please? An operator goes, brings the list to Instacart,

and after it's found the items and added them to the cart, asks whether it should finalize the order.

In a week when crypto has been booming, it's appropriate that another experimental use case, this one from Rowan Chung, who of course runs the rundown, is crypto investment research based on tokens that are actually worth looking into. Obviously, you could generalize this use case as research. The reason that I thought this example was interesting to share was that it demonstrates one part of the human-agent interface. At one point, Operator got hit with an RU human captcha and pinged Rowan to take control again to confirm and move forward.

Number three in another very common demonstration use case, and once again, one that I've railed on before, is travel planning. Y Combinator president Gary Tan writes, OpenAI Operator is very impressive. Planning an impromptu trip to Vegas, it's able to navigate JSX's website and handle unusual cases and basically figure out sold-out scenarios, change dates and times, and now it's figuring out where to eat for Friday night for two.

I will say that when it comes to this type of assistant use case, the more complex the travel is, in other words, the more details that need to be solved, the more I can see this particular type of interface, which just chatters at you to get the information it needs to execute, being an actually useful update. A fourth use case, this one once again from Rowan, researching a good birthday gift for my mom based on what she likes. A couple things that were interesting about this experiment. First of all, there were certain times and websites that it couldn't access, and it was capable of switching gears and finding another site that would do something similar.

It also, in addition to looking for specific items, took it a step farther and actually helped compare and find the best price across the web. Number five, staying on the theme of rote regular tasks, A16Z partner Olivia Moore says, I just gave operator a picture of a paper bill I got in the mail. From only the bill picture, it navigated to the website, pulled up my account, entered my info, and asked for my credit card number to complete payment.

Once again, you see here that it's not going to take that final step of actually inputting the credit card number without human approval. Although presumably in the long run, that might be something that people get more comfortable with actually allowing and various agent assistants actually enable as well. Sixth use case, and this is I think where it gets a little bit more interesting from a business standpoint, is actually using the tool for sales. This comes from Pocketflow AI's Helena Zhang, and let's just listen to the 30 seconds of what she did.

Hi, here's a list of powerful women at companies we would love to work with. And I want to reach out to their head of AI with such a message. So I have prompted operator and talking to the operator. This is just so cool. So basically what operator did here was take a list of names, find their LinkedIn profiles, and add a message to connection requests, effectively doing prospecting.

Lastly, our seventh use case, and again, I saw a number of different examples of this, was using the agent to build apps. Baby AGI creator and VC Yohei writes, I used OpenAI operator to build, deploy, and open source a tool on GitHub using ReplitAgent. Took about 30 minutes. He also gave some feedback, writing, While working with ReplitAgent, it actually deployed the app, tested it, and described the error back to ReplitAgent for me. Operator asked me a few more questions than I wanted, but it was mostly for safety, e.g. filling forms, so I guess okay with it.

It had trouble with a few things around UI, like knowing it needs to scroll a page to see the rest of it, and it needed pointers to find the git feature in Replit. Once it found the git feature, it didn't need my assistance to create a repo and open source after having the agent write a readme. While a bit slower, this was even more automated than Replit Agent, especially testing features and working through errors, which is impressive.

The app that Yohei builds, by the way, was, quote, the classic to-do app with a twist. It's for agents. API for agent to create, read, update, delete tasks. User web UI for manually managing tasks. Test UI for testing endpoints and API performance metrics. Kishan also made an app, sharing a video and tweeting, use ChatGPT operator to use Bolt to create a project management app. A general agent using a coding agent, and it worked pretty well. I even deployed the app. This is insane.

So basically we had here exactly as he described this general agent, which is operator, using the specific bolt agent, which is a web coding agent, to create something and it worked. When you see things like this, which open up fundamentally new possibilities and things that were never possible before, that's why I'm more skeptical of the very basic superficial do my grocery shopping for me type of tasks.

Sure, it could be that assistants get so good at those things that it's not even worth a tiny handful of minutes that it used to take to do them. But certainly what gets me excited and what I think is going to drive more uptake are these never-before-possible things like building complete applications in this way.

Ultimately, the way that I would describe people's general attitudes towards this is that while it isn't a lightning bolt chat GPT style a moment, Operator is just good. It's not great at everything yet. It has some challenges, but it's definitely a preview of the future and where we're headed. I anticipate over the next few weeks, we are going to see a ton of different use cases thrown at this thing and probably some that start to take off as really and regularly valuable.

I will, of course, be back here to share those with you as they happen. But for now, that is going to do it for today's AI Daily Brief. And until next time, peace.