We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The o3-to-AGI-Hype Pipeline

2025/1/22

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

AI for Success

Beth Jezos

Chubby

DeepSeek

Microsoft副总法律顾问

OpenAI消息人士

Sam Altman

领导 OpenAI 实现 AGI 和超智能，重新定义 AI 发展路径，并推动 AI 技术的商业化和应用。

Santi Geneschatz

Snap

Tibor Blaho

Tyler Cowen

Topics

Bob Gurley: 我认为取消AI行政命令是一个正确的举动。它将有助于加快AI创新，从而解决当前AI领域面临的许多挑战。取消冗余的规章制度将使我们能够更快地应对AI带来的各种问题。 Beth Jezos: 我认为这是EACC的重大胜利，并且这仅仅是个开始。我们将会看到更多积极的变化。 Miles Brundage: 然而，取消AI行政命令也存在风险。AI公司将不再有义务向美国政府提供其技术开发的最新情况，这可能会对人类构成威胁。 Lena Kahn: 大型科技公司与AI初创公司的合作可能会导致市场锁定，剥夺初创公司关键的AI资源，并泄露敏感信息，从而破坏公平竞争。我们需要对此进行密切关注。 Microsoft副总法律顾问: 我们与OpenAI的合作促成了世界上最成功的AI初创公司之一，并引发了业界前所未有的技术投资和创新浪潮。 Andrew Ferguson: 我认为FTC取消对Snap AI聊天机器人的投诉违反了言论自由的保护。 Snap: FTC提出的投诉不准确，缺乏确凿证据，也没有确定任何实际损害，并且存在严重的第一修正案问题。

Deep Dive

Shownotes Transcript

Translations:

中文

Today on the AI Daily Brief, OpenAI's O3 Mini seems to be coming soon, but could we also get PhD-level superagents? Before that, in the headlines, in one of his first acts as president, Donald Trump has revoked Biden's executive order on AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes.

Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. We kick off today with something that was expected, but is still no less significant. Kickstarting the Trump era for AI development in the United States, the incoming president has repealed Biden's AI executive order. President Trump spent much of last night repealing executive orders from the previous administration and signing his own.

Among them was the October 2023 order, which governed the, quote, safe, secure, and trustworthy development of the use of artificial intelligence. It was largely directed at government departments to begin research on things including AI safety as well as AI standards. It established the AI Safety Institute within the National Institute of Standards and Technology, which was a body tasked with analyzing safety reports from frontier labs and considering guardrails that should be established in the future.

Functionally, the EO didn't do anything like ban any research, but it still came with some more administrative process, which raised the ire of congressional Republicans.

The TLDR of their point was that the rules were anti-innovation. And now it's pretty clear that the restrictors are coming off as we head into this new Trump administration. Bob Gurley, the CTO of Uta wrote, and just like that, the executive order the AI doomers and decels worked so hard to put in place has been rescinded. We have lots of problems in AI today, most of which require an ability to innovate faster. So rescinding this is a great move. Based Beth Jezos commented, total EACC victory. We're just getting started.

Others, of course, are a little bit more hesitant. Former OpenAI policy researcher Miles Brundage said, so now that the AIEO is repealed, there's no legal obligation for AI companies to give the US government any kind of status updates on the technology they're building, which leaders in the field think could threaten humanity.

Staying in the government theme, although in a very different dimension, the Federal Trade Commission has raised concerns about partnerships between big tech and AI startups. Most recently, in a staff report on Friday, the FTC highlighted the competition issues stemming from partnerships between Microsoft and OpenAI, as well as Google and Amazon partnering with Anthropic. FTC Chair Lena Kahn said in a statement, the FTC's report sheds light on how partnerships by big tech firms can create lock-in, deprive startups of key AI inputs, and reveal sensitive information that can undermine fair competition.

The report specifically focuses on the provision of cloud services. It claims that the partnerships could impact access to computing research and engineering talent. It was also concerned that these partnerships could create a lock-in effect by increasing switching costs for customers. For example, OpenAI customers might find artificial barriers imposed if they try to switch away from Microsoft. Finally, the report highlighted the risk that cloud providers could have unique access to sensitive information. It noted that at least one agreement granted access to model output data which could be used as synthetic data for training.

Now, of course, it feels like this is the FTC positioning for a new administration. In addition to everything mentioned already, the FTC also questioned the circular spending inherent in these deals. In other words, the investment coming in the form of cloud credits or dollars that were likely to be spent on cloud services, basically giving those big tech firms protection from loss.

Still, Microsoft is standing by the partnership, with their deputy general counsel stating that the deal, quote, enabled one of the most successful AI startups in the world and spurred a wave of unprecedented technology investment and innovation in the industry. At this point, the FTC has not filed any AI-related antitrust suits.

Over in another area, however, the FTC has referred its investigation into Snap's AI chatbot to the Justice Department. The FTC's non-public complaint involves allegation that Snapchat's addition of their My AI chatbot poses, quote, risks and harms to young users. The agency noted that, quote, although the commission does not typically make public the fact that it has referred a complaint, we have determined that doing so here is in the public interest. The investigation stemmed from compliance monitoring following a 2014 settlement regarding allegations of public deception around data collection.

Snap admitted that their chatbot is prone to hallucinations and willing to answer inappropriate questions. During an investigative report from 2023, a Washington Post reporter posing as a teenager was able to get advice to hide the smell of alcohol and marijuana. Notably, both Republican commissioners were absent from the meeting where the decision to refer was made. Commissioner Andrew Ferguson issued a dissenting opinion. He said he was not allowed to comment on the case as the details were not public, but said it ran afoul of freedom of speech protections. He commented,

I did not participate in this farcical closed meeting at which this matter was approved. Snap also bit back, saying that the company is focused on the thoughtful development of generative AI and adding, unfortunately, on the last day of this administration, a divided FTC decided to vote out a proposed complaint that does not consider any of these efforts, is based on inaccuracies and lacks concrete evidence. It also fails to identify any tangible harm and is subject to serious First Amendment concerns.

Safe to say that when it comes to AI policy, a lot of the next 100 days is going to be the crazy jockeying and transition between two very different administrations. I'm sure there will be much more significant news than that we've covered today, but for now, that is going to do it for this set of headlines. Appreciate you listening, and up next, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded.

Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk.

Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company. Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vanta to manage risk and prove security in real time.

For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents buy industry horizontal agent platforms.

agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode.

That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business.

If you are interested in the agent readiness and opportunity audit, reach out directly to me, nlw at bsuper.ai. Put the word agent in the subject line so I know what you're talking about. And let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI Quarterly Pulse Survey.

Did you know that 67% of business leaders expect AI to fundamentally transform their businesses within the next two years? And yet, it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications.

For instance, over half of the organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation, they're helping shape it. Learn more about how KPMG is driving AI innovation at kpmg.com slash US. Welcome back to the AI Daily Brief. Is OpenAI about to ship artificial general intelligence?

The conversation that we're having today got started on Friday afternoon when Sam Altman announced that OpenAI's O3 reasoning model is close to release. He posted, Thank you to the external safety researchers who tested O3 Mini. We have now finalized a version and are beginning the release process, planning to ship in a couple of weeks. Also, we heard the feedback. We'll launch API and chat GPT at the same time. It's very good.

The hype cycle began immediately. Santi Geneschatz writes, In fact, there was so much of this type of discussion that Altman dove in, participating in a long discussion in the replies to level set expectations. After McKay Wrigley asked, Altman said,

When Terrace Bob wrote, In terms of who has access to this, the new model will be available to at least OpenAI Pro subscribers. In other words, the folks who are paying $200 per month.

Overall, after the weekend, Sam Altman came back to Twitter to say, Now, of course, when OpenAI first previewed O3 at the end of December, to many, it was the first model that looked a little bit like AGI. It was the first to score 75% on the Arc AGI benchmark, maybe the best yardstick we have right now for testing AGI-style performance.

However, that testing was done on the full model and used an incredible amount of compute. RKGI tests allow for a budget of $10,000 for inference for official ranking. Unofficial OpenAI also completed a run using over 100,000 of inference and performed much higher. But that level of compute isn't feasible to deliver to the public, so we're getting something much smaller and consequently less powerful.

Still, that doesn't mean this model won't be a paradigm shift in its own right. Chubby, for example, wrote: "To explain again why O3 Mini is so important, we get a reasoning model that is better than full O1 and costs only a fraction of it. At medium compute, O3 Mini is still cheaper at least a tiny bit than O1 Mini but outperforms full O1 in code forces by more than 100 ELO. That means better reasoning for more applications and more users. Wider application leads to more insights and more breakthroughs. That's why O3 Mini is so important."

Henry Mao, the founder of Genie AI, got specific. If O3 Mini is cheap enough, it might just supplant 4.0 and Sonnet 3.5 for daily coding tasks.

Blake C., an app developer, wrote, Tdm suggested that this isn't really about releasing a more performant model, but rather a step towards making OpenAI's reasoning models more cost-effective. They posted, So O3 Mini is basically just faster O1. I think the primary reason they are releasing this is that the O1 cost can't be reduced enough to sustain scale while not losing money on it.

Another would be for API devs to start using O3 Mini more instead of Sonnet since it would be faster and smarter. And so, taking cues from Sam Altman, this really doesn't sound like consumer-grade AGI. And yet there are other hints that OpenAI is approaching some very big things. Axios reported over the weekend that Sam Altman has been invited to brief the Trump White House next week. The article stated that, quote, "...a top company, possibly OpenAI, in coming weeks will announce a next-level breakthrough that unleashes PhD-level superagents to do complex human tasks."

OpenAI sources said that they are, quote, both jazzed and spooked by recent progress. Interestingly, there haven't really been any public rumblings about OpenAI launching agents, but it does seem to many that this is an area where the company has been lagging behind. And yet it seems like this might not be the case for long. Tibor Blaho, for example, found references to agents in OpenAI's code. He tweeted, "'Confirm the ChatGPT macOS desktop app has hidden options to define shortcuts for the desktop launcher to toggle operator and force quit operator.'"

Operator is the name of OpenAI's forthcoming general purpose agent. The information previously reported that January was the intended launch month for Operator. Chubby once again also noted that OpenAI already has a comparison page on their website showing Operator's performance contrast against Anthropic's computer use mode and Google's Mariner agent.

They wrote, looks like release is imminent. The benchmarks in this leaked graphic, which we don't know if it's real, show a substantial step up from Anthropic's model and a slight improvement from Google's dedicated web browsing agent in that domain. Still, it doesn't seem as though OpenAI have perfected computer use mode. For example, the leaked testing showed the agent could only successfully sign up for a cloud services account and launch a virtual machine 60% of the time.

Responding to some of the hype, Kumar Apparanjee, the head of automation at Cognizant, tried to tamp down expectations of what these agents can do. He posted, Not even DeepSeek R1, although it is 27x cheaper than R01.

Speaking of which, while these release rumors from OpenAI said Imagination's racing, a rival Chinese lab sucked a lot of the oxygen out of the room with their latest model. Over the weekend, DeepSeek released their full version of the R1 reasoning model.

Now, you might remember that we've talked about DeepSeek a number of times. Economist Tyler Cowen used it as his example of why Trump should think differently about Biden's chip export policies. And in terms of what was released, the model performs in line with O1 on most benchmarks, in particular, SweeBench Verified, which focuses on programming tasks.

R1 is now fully available as an open source model for commercial use and is capable of serving outputs via API at less than 5% of the cost of R01. Hobbyists are also able to run the model at home, with several demonstrating that it runs on a cluster of Mac minis.

Accompanying the full release of R1 was a technical paper describing the post-training process, which develops reasoning capability on top of a foundation model. DeepSeek said they tried multiple forms of post-training before landing on a relatively simple reinforcement learning process. Max Winga, a research engineer at Conjecture AI, posted, It's wild to me that they did this with no fine-tuning prior to the RL stage. R1 learns to reason on its own like AlphaZero. During training, they observed the model learning to use advanced reasoning techniques, an aha moment. We're playing with alien minds, not just tools.

AI entrepreneur Elvis Saravia writes, The DeepSeek R1 paper is a gem. It's clear that LLM reasoning capabilities can be learned in different ways. Reinforcement learning, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties.

Now, all of this has some people thinking ahead to future possibilities. The AI for Success account, for example, tweets, In a few years, China will create AGI and open source it for all. DeepSeek R1 costs 96% less compared to OpenAI O1, and it's almost as good as O1. Intelligence too cheap to meter. 2025 is going to be crazy. I can feel it.

Indeed, the rapid development going on in China has major implications for AI policy. In announcing the latest round of export controls, the Biden administration made it clear that international competitiveness was a key issue. The policy statement set an explicit goal to ensure that U.S. models are dominant across the world, especially in the global south. Dean W. Ball, a research fellow at George Mason University, posted, "'Deep-seek R1 takeaways for policy. One, Chinese labs will likely continue to be fast followers in terms of reaching similar benchmark performance to U.S. models.'"

Two, the impressive performance of DeepSeq's distilled models, smaller versions of R1, means that very capable reasoners will continue to proliferate widely and be runnable on local hardware, far from the eyes of any top-down control regime, including the U.S. diffusion rule. Three, open models are going to have strategic value for the U.S., and we need to figure out ways to get more frontier open models out to the world. We rely exclusively on Meta for this right now, which, while great, is just one firm. Why do OpenAI and Anthropic not open source their older models? What would be the harm?

Mostly where people's minds are is just feeling the acceleration. Perplexity CEO Aravind Srinivas writes, It's kind of wild to see reasoning get commoditized this fast. We should fully expect an O3-level model that's open source by the end of the year, probably even mid-year. So friends, lots going on as we dig deeper into January. That, however, is going to do it for today's AI Daily Brief. Until next time, peace.

The o3-to-AGI-Hype Pipeline 16:21 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

The o3-to-AGI-Hype Pipeline