We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The Biggest Stories in AI This Week

2025/4/20

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

AI Deep Dive AI Chapters Transcript

People

主

主播

以丰富的内容和互动方式帮助学习者提高中文能力的播客主播。

Topics

我总结了过去一周人工智能领域最重要的新闻，包括美国政府考虑禁止DeepSeek，美国政府对英伟达H20芯片的出口管制，以及OpenAI发布新的推理模型O3和O4 Mini等。这些事件反映了当前人工智能领域的技术发展、地缘政治竞争以及商业竞争等多方面因素。 OpenAI发布的O3和O4 Mini模型具有图像整合和工具使用等新功能，被认为是人工智能领域的一次重大进步，甚至有人认为O3已经达到了通用人工智能(AGI)的水平。这些模型在编码、科学和代理任务方面表现出色，也引发了人们对未来人工智能应用的广泛讨论。英伟达与美国政府之间的博弈也值得关注。英伟达承诺在美国投资巨资建设AI基础设施，但美国政府仍然对英伟达芯片实施出口管制，这反映了中美两国在人工智能领域的竞争态势。此外，Anthropic公司即将发布语音模式，苹果公司公布了改进AI技术的新计划，微软和OpenAI也在积极发展其AI产品。总的来说，过去一周人工智能领域发生了许多重要事件，这些事件将对未来人工智能技术发展和应用产生深远影响。

Deep Dive

Shownotes Transcript

Translations:

中文

Today on the AI Daily Brief, all the most important stories in AI from this past week while I was traveling. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. ♪

Hello, friends. Quick note before we dive in today. I'm, of course, coming off some travel. And so this week, instead of a long reads episode, I decided to do a bit of a catch up on some of the most important news. There were some interesting things that went down. So let's dive right in. Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. As you know, I have been out all week, so we have a lot to catch up on. And we are kicking off with some geopolitics where the Trump administration is reportedly considering a deep seek ban.

According to the New York Times, further restrictions include banning the startup from purchasing U.S. technology and barring Americans from accessing DeepSeek's models. Now, the report didn't get into how exactly a government could ban open source models, but functionally, simply banning cloud providers from offering them is probably close enough. Congress also apparently has DeepSeek in its sights. The

The House Select Committee on China called the AI startup a, quote, profound threat to U.S. national security by harvesting American users' data and sending it back to China. Their report states, Although it presents itself as just another AI chatbot, offering users a way to generate text and answer questions, closer inspection reveals the app siphons data back to the People's Republic of China, creates security vulnerabilities for its users, and relies on a model that covertly censors and manipulates information pursuant to Chinese law.

Now, whether or not this comes to pass tells you a lot about the state of the conversation as it relates to China and AI right now. Speaking of, another company in trouble in that area is NVIDIA. On Wednesday, the Trump administration extended export controls to cover NVIDIA's H20 chips, the downrated version of the H100 that are designed to comply with controls introduced in the Biden era. The administration said that the enhanced rules address concerns that, quote, the covered products may be used in or diverted to a supercomputer in China.

Nvidia, for their part, warned that they would report $5.5 billion in write-downs associated with inventory and commitments for the chips, which have essentially zero demand outside of China. In terms of how much this would impact the development of Chinese AI, Biden Commerce Department staff said that the bans would make it around 3-6% more costly to develop an AI model in China.

Since then, we've had constant reports of Chinese researchers doing more with less, so that figure is very much up for debate. NVIDIA had been outspoken about existing export controls and lobbied against them going further. In January, as Biden's outgoing team imposed the last round of tightening, the company said that export controls, quote, will only harm the U.S. economy, set America back, and play into the hands of U.S. adversaries. Now, earlier this month, CEO Jensen Huang made a trip to Mar-a-Lago to petition the president directly.

Following his attendance at a million-dollar ahead dinner, NPR reported that Trump had reversed course on new controls on the H20s. The report stated that bans had been set to go into effect as soon as last week. The quid pro quo had been a dramatic ramp-up of local investment from NVIDIA. NPR sources made vague reference to investments in AI data centers, but this week we've seen a wave of reports of NVIDIA's commitments to U.S. manufacturing.

On Monday, the company announced that they had begun production of Blackwell chips in TSMC's Arizona facility. They also committed to producing AI supercomputers at a pair of facilities in Texas. In total, NVIDIA claimed they would produce a half a trillion dollars worth of AI infrastructure in the U.S. over the next four years. They said their manufacturing was, quote, "...expected to create hundreds of thousands of jobs and drive trillions of dollars in economic security over the coming decades."

In the press release, Huang said, The engines of the world's AI infrastructure are being built in the United States for the first time. Adding American manufacturing helps us better meet the incredible and growing demand for AI chips and supercomputers, strengthens our supply chain, and boosts our resiliency. Alas, the high-profile announcement doesn't seem to have been enough, with the export controls still going into force two days later. The Financial Times reports the announcement came as a complete surprise to NVIDIA, saying that earlier this month the company had assured Chinese tech giants that supply of H20s would not be interrupted.

And so as of Thursday, Huang is visiting Beijing to meet with political and tech leaders. According to state broadcaster CCTV, the CEO said that China was a, quote, very important market for NVIDIA, and that his company would, quote, make a significant effort to optimize our products that are compliant with the regulators and continue to serve the Chinese market. Speaking of DeepSeek, sources said that his itinerary included meeting with DeepSeek founder Lang Wenfeng to discuss a new chip design to meet regulatory requirements set by Washington and Beijing.

A public meeting with the China Council for the Promotion of International Trade was televised, and Huang also reportedly met separately with Chinese Vice Premier He Li-Feng. The press, for their part, is reading a lot into the deference shown by Huang, who discarded his trademark leather jacket for a suit and tie to attend high-level meetings in Beijing. Speaking to the Whirlwind China visit, President Trump told the press, quote, Jensen's an amazing guy. He's become a friend of mine. He's a person that's very proud of our country. He loves our country. I'm not worried about Jensen at all.

Back home, big fundraising continues. Ilya Sutskaver's Safe Superintelligence has closed a new round of funding that values the company at a whopping $32 billion. The former OpenAI chief scientist founded the startup less than a year ago. In September, SSI raised a billion dollars at a $5 billion valuation, a price tag that already seemed a little rich to some for a company with no product and little more than a big-name founder and a resonant mission statement. But obviously to many people, that would be dismissing what SSI actually has.

This round, which brought in an additional $2 billion, has marked up the valuation 6x. To put it in perspective, Anthropic was valued at $61.5 billion during last month's funding round, meaning that SSI has already achieved half that valuation. There are a couple of potential reads on the situation. The first take is that for those who are in this game, venture firms are simply not that price sensitive when it comes to getting into the very small handful of companies that actually have a chance to reach AGI first.

Reports from February said that SSI was in talks to raise it $20 billion then, so there's been a pretty significant jump in valuation during the negotiations, meaning there's a lot of demand. The second read is that SSI might have made actual progress over the past few months. Certainly everyone is wondering about what the product will look like. James Cham, a partner at venture firm Bloomberg Beta said, Everyone is curious about exactly what Ilya is pushing and exactly what the insight is. It's super high risk and if it works out, maybe you have the potential to be part of someone who is changing the world.

Couple more before we wrap up. Anthropic is preparing to release their long-awaited voice mode. Bloomberg reports that the feature could be released as soon as this month. Sources said the rollout will feature three different voices for Claude, identified as airy, mellow, and a British-accented version referred to as buttery.

CEO Dario Amadei first teased voice mode during a January interview with the Wall Street Journal. One of the reasons he gave for the long delay was a desire to ensure that Claude's voice was comfortable and natural enough for long interactions. The rollout will also be the first big test of Anthropic's new premium $200 per month subscription.

In Microsoft land, the company has enabled a new computer use feature for Copilot Studio. This feature is similar to offerings from OpenAI and Anthropic, and allow Copilot to take over the computer to interact with websites and apps. Charles LaManna, the VP of Copilot, said, This allows agents to handle tasks even when there is no API available to connect to the system directly. If a person can use the app, the agent can too.

Apple, meanwhile, has revealed a convoluted plan to improve their AI without compromising privacy. In a technical blog post, the company laid out a system that can check their synthetic data against tokenized user data without revealing information. The idea is that the synthetic data that most closely matches real user data can be used as the training set for Apple's next generation of models. This means that the company can technically state that their models aren't trained on user data. Apple says this method can be used to improve the performance of writing assistants, photo editing models, and their generative emoji feature.

Kind of an overcomplicated way to catch up with features that people were excited about back in 2024, but there you are. Separately, the New York Times reports that AI-enhanced Siri would finally arrive this year. Their sources said that current plans are to release the updated assistant in the fall, explaining the features they gave the example of being able to edit a photo and send it to a friend. A fall rollout would be well ahead of previous estimates, though, with Bloomberg tech editor Mark Gurman previously stating that he thought that Siri, quote, won't be ready until 2027 at best.

The information writes, Apple's AI ML group has been dubbed aimless internally, while employees are said to refer to Siri as a hot potato that is continually passed between different teams with no significant improvements. So I guess at the end of a week away, the more things change, the more they stay the same. That's going to do it for today's AI Daily Brief Headlines Edition. Next up, the main episode. Today's episode is brought to you by Vanta.

Vanta is a trust management platform that helps businesses automate security and compliance, enabling them to demonstrate strong security practices and scale. In today's business landscape, businesses can't just claim security, they have to prove it. Achieving compliance with a framework like SOC 2, ISO 27001, HIPAA, GDPR, and more is how businesses can demonstrate strong security practices.

And we see how much this matters every time we connect enterprises with agent services providers at Superintelligent. Many of these compliance frameworks are simply not negotiable for enterprises.

The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy and faster by automating compliance across 35+ frameworks. It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC whitepaper found that Vanta customers achieved $535,000 per year in benefits, and the platform pays for itself in just three months.

The proof is in the numbers. More than 10,000 global companies trust Vanta, including Atlassian, Quora, and more. For a limited time, listeners get $1,000 off at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off.

Today's episode is brought to you by Super Intelligent and more specifically, Super's Agent Readiness Audits. If you've been listening for a while, you have probably heard me talk about this, but basically the idea of the Agent Readiness Audit is that this is a system that we've created to help you benchmark and map opportunities for your business.

in your organizations where agents could specifically help you solve your problems, create new opportunities in a way that, again, is completely customized to you. When you do one of these audits, what you're going to do is a voice-based agent interview where we work with some number of your leadership and employees,

to map what's going on inside the organization and to figure out where you are in your agent journey. That's going to produce an agent readiness score that comes with a deep set of explanations, strength, weaknesses, key findings, and of course, a set of very specific recommendations that then we have the ability to help you go find the right partners to actually fulfill.

So if you are looking for a way to jumpstart your agent strategy, send us an email at agent at besuper.ai and let's get you plugged into the agentic era. Welcome back to the AI Daily Brief. It's pretty clear that the big news this week in AI was the introduction by OpenAI of a set of new reasoning models. On Wednesday, OpenAI released O3 and O4 Mini.

O3 is their most advanced reasoning model to date, while O4 Mini is being pitched as a competitive tradeoff between price, speed, and performance. There's also a high-resource version of O4 Mini called O4 Mini High. So the trend of OpenAI having completely clear names continues. The new batch of reasoning models introduces some new features to the O series family. First, the models can integrate images into their reasoning process.

We've seen something along these lines show up as an emergent property of multimodal models like Google's Gemini, but this will be the first time that OpenAI has pushed the limits on what the reasoning modality can do. OpenAI told VentureBeat, These models don't just see an image, they think with it. It unlocks a new class of problem-solving that blends visual and textual reasonings.

The other big improvement is tool use, with the new models natively trained on common tools. The company wrote, President Greg Brockman commented,

Now, this could represent a big jump in agenda capabilities. For agents, being able to figure out the right tools to use for any given situation is going to be one of their biggest unlocks and is pretty key to enabling ultimately fully autonomous agents. Right now, one of the most common failure states for agents is either failing to recognize when to use a tool or failing to access the tool properly.

Now, it wouldn't be a new model release without a whole bunch of benchmarks that you're not exactly sure what they mean or how much to care about. And in fact, the tool use appears to be showing up here. O4 Mini, for example, managed to score 99.5% on the AIME 2025 mathematics competition, but only when given access to a Python interpreter. More broadly, OpenAI is claiming that O3 benchmarks is state-of-the-art across standard coding, science, and agentic tasks.

However, as you all have heard me say before, I think that given the challenges of benchmarks, it's much more relevant to see what people are actually doing with these tools. Kelsey Piper of Vox's Future Perfect said that O4 Mini High is the first model to pass her own, quote, personal secret benchmark for hallucinations and complex reasoning. Her test involves presenting the inputs of a complex mid-game chessboard and the prompt, mate in one.

The catch is that there is no checkmate in one move. AI models are trained on extensive chess puzzles of this kind, but their training set doesn't necessarily include this kind of trick question. Piper said that her prior testing showed that models reason through thousands of possibilities before hallucinating a solution. This generally involves adding extra pieces to the board or illegal moves. The models will then add lengthy justifications for why their hallucinated solution is correct.

She had run this test on every cloud model to date, as well as Gemini 2.5 Pro, GBT-03 Mini High, and Grok 3, with none figuring out that the solution is impossible. Why is this a big deal? I invented this problem because I think it gets at the core of AI's potential and limitations. An AI that can't question its premises will always be limited. An AI that doubles down on its own wrong answers will too.

She noted that the reasoning trace was eight minutes long, much longer than any other query she ran, saying, "...that's a lot of places to potentially make mistakes and hallucinate a solution. Its expectation that there was a solution was very strong, but it overcame it." She added in conclusion, however, "...that said, its explanation of why there was no checkmate in fact still contains some chess inaccuracies, which I know it knows better than, so certainly don't trust these things, but know they're continually getting better."

An even more vociferous endorsement came from economist Tyler Cowen, who wrote, I think it's AGI. Seriously. Try asking it lots of questions and then ask yourself, just how much smarter was I expecting AGI to be? I've argued in the past AGI, however you define it, is not much of a social event per se. It will still take us a long time to use it properly. Benchmarks, benchmarks, blah, blah, blah. Maybe AGI is like porn. I know it when I see it. And I've seen it.

Now, I haven't had as many reps as I normally would have this week with O3 given the travel, but I am absolutely 100% in the Tyler Cowen camp here. Not necessarily that O3 is AGI, but that it doesn't matter. These models have so far to me been an absolute step change improvement relative to O1 and what we were using in the past. I've been testing them as a business thought partner, and the reasoning is so much more thorough, so much more interesting, and just generally better.

In fact, I've implored, by which I mean basically demanded that everyone inside Superintelligent start playing around with O3 as a brainstorming partner for pretty much everything. I genuinely think it's that good. Now, I think it'll still take some time for us to figure out exactly what the best use cases for these models are. Although if enough people like me demand that all their colleagues use it for every business interaction from here on out, I'm sure we'll figure it out more quickly. Still, one use case that people jumped on very fast

was that O3 appears to be disturbingly good at geoguessing. Given basically any photo of a landscape or a building, the model can pinpoint its location on a map. Henri on X wrote, "'Ten years ago, the CIA would have gotten on their knees for this. Every single human has just been handed an intelligent superweapon. It's only getting stranger.'"

I would implore you if you haven't had a chance yet, go play with this model. Even if you don't have something specific that you're trying to do, try asking it whatever business question you're thinking through at the moment. Use it as a thought and collaboration partner and just see how different it feels as opposed to past models. It is of course totally possible that I'm in the few first day glow of a new toy and that it's actually not all that different, but I kind of don't think so.

Now completely overshadowed by the 03 and 04 mini releases, OpenAI also rolled out a new update to their non-reasoning model family earlier in the week on Monday. GPT-4.1 will be the successor to GPT-4.0 and is now available to developers through the API. The GPT-4.1 family includes three different sizes with a mini and nano variant available alongside the full-size model. OpenAI says that the nano version will be their smallest, fastest, and cheapest model yet.

Another big update, the models have a million-token context window matching Google's recently released Gemini 2.5 Pro. As we've discussed before, ultra-long context windows are especially important for coding assistants and agents, allowing users to dump entire codebases into the model or run longer agentic workflows. And it seems that GPT-4.1 is explicitly aimed at coding use cases. An OpenAI spokesperson said, "...we've optimized GPT-4.1 for real-world use based on direct feedback to improve in areas that developers care most about."

Front-end coding, making fewer extraneous edits, following formats reliably, adhering to response structure and ordering, consistent tool usage, and more. These improvements enable developers to build agents that are considerably better at real-world software engineering tasks.

If nothing else, this is definitely OpenAI competing on price in a very aggressive way. Michelle Pokras, the post-training research lead at OpenAI said, Not all tasks need the most intelligence or top capabilities. Nano is going to be a workhorse model for cases like autocomplete, classification, data extraction, or anything else where speed is the top concern.

Entrepreneur Paul Gauthier noted that this week's releases are more than the sum of their parts, posting, "...using O3 High as architect and GPT 4.1 as editor produced a new state-of-the-art of 83% on the Ader Polyglot coding benchmark. It also substantially reduced costs compared to O3 High alone."

Now, speaking of coding, something we've talked a lot about on this show is how for some time, Anthropix Cloud has been the go-to choice for developers. Well, OpenAI is definitely not giving up that fight because alongside these new models, they also rolled out a new coding agent. Sam Altman posted, "'03 and 04 Mini are super good at coding, so we're releasing a new product, Codex CLI, to make them easier to use. This is a coding agent that runs on your computer. It's fully open source and available today. We expect it to rapidly improve."

Now, because it's open source, of course, there are already forks that enable models from outside the OpenAI ecosystem as well. First reactions? Seemed decent. Gooby said, used Codex CLI with O3. Used like 150 in tokens in like an hour. Switching to O4 Mini now, LMAO. That being said, O3 was cooking. Fixed a couple of longstanding bugs. Rishabh Srivastava wrote, vibes for Codex CLI so far? Been a bit meh for me. Cloud code still much better. Codex with O4 Mini has been fantastic for one-shot single file edits.

Extremely good at fixing subtle bugs when specifically prompted. Meh at iteration and retaining content and at multi-file edits. Terrible at creating documentation and explaining a codebase.

So for now, maybe Claude can breathe a sigh of relief, but it's pretty clear that OpenAI wants to compete in that space, which is also validated by the fact that on Wednesday, Bloomberg reported that the company is looking to acquire Windsurf. Windsurf is probably the best-known cursor competitor and was valued at $1.25 billion back in August and was reportedly in talks to raise at a $3 billion valuation earlier this year.

The reports state that OpenAI is looking to make the acquisition at $3 billion, but sources say the deal hasn't been finalized and could still fall apart. Now, if you're wondering why not just buy Cursor instead, Sam Altman apparently thought of that as well and made two separate attempts to buy the leading agentic coding platform. One was late last year and another early this year. In fact, CNBC sources say that OpenAI has actually met with 20 companies in the AI coding domain before reportedly finding a deal with Windsurf.

All in all, it was an extremely busy week in OpenAI land, and I'm ignoring about a half a dozen stories that otherwise might have merited attention. For now, what I will leave you with is my strong instinct to please go try O3, play around with O4 Mini as well. These really do feel like a different quality of model and a different quality of experience, and I think are going to open up some different types of use cases. For now, though, that's going to do it for today's AI Daily Brief. Until next time, peace.

The Biggest Stories in AI This Week 20:42 Share

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Deep Dive

Shownotes Transcript

We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

The Biggest Stories in AI This Week