We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
People
A
Anthropic
G
Google
谷歌通过推出Gemini 2.0 AI 模型和 AI Mode 搜索模式,进一步提升了其在人工智能领域的竞争力和创新能力。
O
OpenAI
主持人
专注于电动车和能源领域的播客主持人和内容创作者。
Topics
OpenAI:为了确保人们能够自由地获取和受益于不断发展的AI,我们需要减少繁琐的法律和官僚障碍,同时抵制试图剥夺人们自由的专制力量。美国需要联邦立法来优先于州法律,以促进AI创新,并通过促进美国AI系统的全球采用来增强美国的竞争力。为了国家安全,美国应该禁止中国AI,并促使盟友效仿,因为中国AI存在潜在的操纵和安全风险。为了保持美国的AI领先地位,美国应该允许AI模型从受版权保护的材料中学习,即使这意味着要修改版权法。美国政府应该大力投资基础设施建设,特别是电力传输基础设施,并推动政府部门采用AI。美国需要积极的国际经济政策来支持AI创新,同时避免因监管不当而损害创新、国家竞争力和科学领导力。 Google:为了支持美国企业在全球市场的竞争力,美国应该放宽版权法,允许AI模型在训练中公平使用数据。美国应该放宽拜登时代的出口管制,并制定更精细的管制措施,以支持美国企业的合法市场准入,同时应对最相关的风险。美国政府应该增加对基础AI研究的投资,以保持其在全球AI领导地位方面的优势。为了明确责任,美国应该在AI模型提供商和用户之间明确责任划分,并避免过于宽泛的安全披露要求。 Anthropic:鉴于AGI即将到来,美国政府应该立即建立国家安全测试机制,对国内外AI模型的国家安全影响进行评估。为了国家安全,美国应该加强出口管制,包括对芯片的管制以及与托管大型芯片部署的国家之间的政府间协议。美国政府需要为AI带来的潜在经济影响做好准备,包括改进经济数据收集机制。 主持人:美国大型AI实验室就美国AI行动计划提出了不同的建议,这些建议反映了它们各自的优先事项和对美国AI政策的看法。美国AI行业普遍支持加速AI发展,并寻求政府在基础设施建设和政策制定方面的支持,以应对来自中国的竞争。美国AI行业对AI政策的建议融合了解管制、支持企业和政府补贴等多种方法,但其核心目标是加快AI发展,并应对来自中国的竞争。

Deep Dive

Chapters
Google's Gemini 2.0 Flash update includes a new native image generation feature that allows users to create and edit images using natural language prompts. This feature is multimodal, handling image and voice without conversion, and has gained significant attention for its capabilities.
  • Gemini 2.0 Flash update includes native image generation.
  • Multimodal architecture handles image and voice without conversion.
  • Users can create illustrated stories, edit images, and generate recipes with images.

Shownotes Transcript

Translations:
中文

Today on the AI Daily Brief, what the US AI action plan should be according to the big frontier model labs. And before that in the headlines, Google generates a lot of excitement with its new image generation model. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes.

We haven't had cause to talk about an image generation model for a while, but today that's what a lot of people on X are talking about. The start of the story is that this week Google rolled out a major update for Gemini 2.0 Flash. Most of the new features were important but relatively mundane. The model now has access to memory, it can access a user's search history for additional context, and the deep research feature has been updated to support the latest model. However, the feature that has grabbed all the attention is a new native image generation feature.

One of the big differentiators with Google's LLMs is a native multimodal architecture. For some models, in fact this was the standard for some time, if a user asks the model to interpret an image, that image has to be converted into a text description before it can be fed into the LLM. The Gemini models can handle image and voice with no conversion in between.

Google introduced the new feature with a few sample use cases. Users can get Gemini to create an illustrated story, interspersing text with images. The model can also edit images using natural language prompts, and this is by far the most discussed part of this. For example, Google demonstrated the model adding a bouquet of flowers to a dining room table.

There was also a demonstration of reasoning being combined with image generation, with Gemini shown creating a recipe with pictures of each step being completed. Google also noted, and this is an extremely important feature for me, that this type of image generation can produce really strong, clear text. Now, the power of this tool set was immediately obvious, and the internet got right to figuring out what else the new feature could do. Professor Ethan Malek snapped a picture of a Taylor Swift crochet kit on a shelf and asked the model to make it about Napoleon, including changing the text.

Linus Ekenstam changed the background of a selfie, turned his face to the side, and then added a propeller hat. Former Anthropic developer Chris removed Dario Amadei's hair. Now we've seen this use text to edit your image feature theoretically embedded in other applications, but people seem to be responding to how flawless this version is. Video game concept artist Christian Panas generated an anime character. He then asked the model to place the character in a video game environment, run around a bit, and climb up a wall.

Gemini created stills following along with prompts, maintaining coherency throughout. He also demonstrated that Gemini can do a simple frame-by-frame pixel animation with sufficient prompts. This sort of style stability is a huge unlock for professional use cases. This does not, of course, mean that the model is perfect. Forvert tried a similar animation starting with a realistic but AI-generated girl's face. Over about 20 iterations, the images, to use their words, quote, slowly degraded into a horror show.

Still, it's a major step up in the state of the art for image generation, especially controllability. And being able to do it natively from a Gemini chatbot session is going to be for many people a big UX improvement. Torio quipped, so when Sam Altman said expect big improvements in image generation, he was talking about Gemini.

Next up, speaking of viral AI models that have gotten a lot of attention recently, Sesame has open sourced their viral voice assistant, Maya. When Maya was previewed two weeks ago, it took the internet by storm. Users were timing out their conversations, having what was by all accounts a very engaging AI chat experience. It has become a cliche, of course, to refer to things as the chat GPT moment for X, but many people argued that that's exactly what Maya was for AI voice.

The model was able to have flowing conversations, it handled interruptions seamlessly, it used subtle human voice ticks like pausing and pace changes, all of which led to Sesame arguing that they had crossed the uncanny valley of AI speech and achieved something that they called voice presence. Well, now that model is open-sourced, meaning it's freely available to developers to add to their apps.

Maya is licensed under the Apache 2.0 license, which has very few restrictions on commercial use. The model comes with a small selection of voices, but users can add their own using just a few sentences of voice samples. Using the demo on Hugging Face, Kyle Wiggers of TechCrunch said he was able to clone his voice in under a minute and start generating speech. Sesame did note that the model doesn't currently have any safeguards. They're working on the honor system and asking users not to clone people's voice without consent or engage in harmful activities.

Lastly today, Chinese big tech firm Alibaba has unveiled a new version of their AI Assistant app, adding basic agentics to the platform for the first time. The new version of the Quark app has now been updated to take advantage of Alibaba's latest Qen reasoning model. The Assistant can now conduct AI searches as well as deep research and task execution.

Part of why we're paying attention to Alibaba is that they've been shipping extremely fast this year and moving quickly with partnerships as well, announcing earlier this week, for example, that they were working with the Viral Manus agent to bring that experience to the Chinese market. Beyond just the China-US part of this story, it's also another indication that agentic AI is fast becoming the default user interface across the board.

This release from Alibaba is explicitly about supplanting the usual browsing experience with an agentic assistant. You've got a similar phenomenon happening over in the US with tools like Perplexity and Deep Research taking market share in search. Agentic coding assistants are becoming ubiquitous. And improvements in voice models are also breaking down friction.

Aditya Sharana, an agent builder, writes, in my opinion, the latter half of 2025 will be about who makes the best AI agent interfaces for everyday use. The real winner will be the one who makes it open source. That, however, is going to do it for today's AI Daily Brief headlines. Next up, the main episode. We talk a lot about agents on this show, but if you've ever thought to yourself, I don't want to talk about agents anymore. I just want to actually build and deploy something. I'm really excited to share something special with you today.

We've partnered with Lindy to offer companies that just want to dive into the deep end of agents a way to get their feet wet, a way to move fast and build something meaningful without breaking the budget.

The first five companies that email me, nlw.bsuper.ai, with Lindy in the title, will have access to work with Lindy to build an actual functional agent serving their specific needs for under $20,000. Some of the agents you can build include a customer support agent, maybe automating responses on your website.

You could build an SDR for generating or qualifying sales leads, or you could build an agent that's perfectly suited for your internal communications needs, be it note-taking, scheduling, or something else. Not only is Lindy structured to integrate with all of the places that you already keep data and information, it's also a full extensible platform, which means as you hire more and more agent employees and really build out your digital workforce, Lindy is going to enable those agents to be interoperable and basically be able to work together in a seamless way.

So again, if you are interested in diving in all the way to agents in a matter of weeks, not months, not years, email me, nlw at bsuper.ai, put Lindy in the title, and let's get your first digital employee online. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded.

Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC 2 and ISO 27001. Centralized security workflows complete questionnaires up to 5x faster and proactively manage vendor risk.

Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back so you can focus on building your company. Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vanta to manage risk and prove security in real time.

For a limited time, this audience gets $1,000 off Vanta at vanta.com slash nlw. That's v-a-n-t-a dot com slash nlw for $1,000 off. Hey listeners, are you tasked with the safe deployment and use of trustworthy AI? KPMG has a first-of-its-kind AI Risk and Controls Guide, which provides a structured approach for organizations to begin identifying AI risks and design controls to mitigate threats.

What makes KPMG's AI Risks and Controls Guide different is that it outlines practical control considerations to help businesses manage risks and accelerate value. To learn more, go to www.kpmg.us slash AI Guide. That's www.kpmg.us slash AI Guide.

Today, we are talking AI policy. And this has actually been sort of an underexplored area over the last year or so. I think the US election cycle last year crowded out a lot of space for this discussion. But as AI heats up, obviously, there's geopolitical dimensions. But there's also just the fundamental underlying how policy and regulations are going to interact with industry and shape the set of options.

that we have when it comes to these tools and how we approach them. Now, specifically, one of the first acts of this new Trump administration was unwinding the Biden AI policy. A few days after inauguration, President Trump signed his own AI executive order, repealing the one that had been signed by President Biden. It was, as so many EOs are, filled with rhetoric but devoid of policy, leaving a vacuum in its wake. Then in late February, the administration opened a two-week comment period for input into what they're calling the AI Action Plan.

And throughout this week, big U.S.-based labs have been weighing in on the policies about what should come next. So today we're going to go over the responses from OpenAI, Google, and Anthropic and see how they are alike, how they differ, and what it says about what U.S. AI policy should be. On Thursday, OpenAI published their proposal. Setting the tone, they borrowed from Sam Altman's prior writing to state...

We are at the doorstep of the next leap in prosperity, the intelligence age. But we must ensure that people have freedom of intelligence, by which we mean the freedom to access and benefit from AI as it advances, protected from both autocratic powers that would take people's freedoms away and layers of laws and bureaucracy that would prevent our realizing them.

Now, throughout the document, OpenAI's policy prescriptions hit on five major topics. They requested a, quote, regulatory strategy that ensures the freedom to innovate. This is largely about ensuring that partnership with the federal government was voluntary. And it repeated OpenAI's oft-made point that USAILabs should be unshackled from, quote, overly burdensome state laws. Basically, OpenAI is saying that we need federal legislation that overrides state laws, be

because companies are going to be significantly slowed down if they have to deal with 50 different regulatory jurisdictions. When it comes to export controls, OpenAI wants to focus on ensuring that American AI is widely available. They suggested a, quote, strategy that would apply a commercial growth lens, both total and serviceable addressable markets, to proactively promote the global adoption of American AI systems and with them the freedoms they create.

In a larger letter that they published simultaneously, the company also requested changes to the Biden diffusion rule that divided countries into three tiers and placed limits on middle-tier countries including India and Israel. They want these limits rolled back to only cover countries with a history of failing to prevent controlled chips from entering China. And China is indeed a big area of focus for OpenAI. They in fact include a proposal to ban Chinese AI and force close allies to follow suit. Using DeepSeek as the prime example, they wrote,

As with Huawei, there is significant risk in building on top of DeepSeek models in critical infrastructure and other high-risk use cases, given the potential that DeepSeek can be compelled by the CCP to manipulate its models to cause harm. And because DeepSeek is simultaneously state-subsidized, state-controlled, and freely available, the cost to its users is their privacy and security. One of the juicier lines in their report, the CCP views violations of American IP rights as a feature, not a flaw.

There is, on the one hand, the appearance of a noteworthy tension here. OpenAI wanting a ramping down of export controls to allow American AI to be deployed globally, but at the same time proposing to force a ban on Chinese AI if countries want to maintain Tier 1 status.

I actually think it's less incoherent than it seems. It basically all amounts to a much more strict focus on China as the problematic country in the equation, and a stronger emphasis on American competitiveness and giving people access to non-Chinese models as a competitive force.

One of the more controversial suggestions from OpenAI was a carve-out from copyright laws to allow AI training. OpenAI tried to frame this as a balanced approach that still protects content creators, but asserted, "...the federal government can both secure America's freedom and learn from AI and avoid forfeiting our AI lead to the People's Republic of China by preserving American AI models' ability to learn from copyrighted material." They commented that if Chinese developers have unfettered access to data and American companies are left without fair use access, the race for AI is effectively over.

OpenAI gave the example of the EU model where data mining is allowed but there are broad opt-outs for rights holders. They noted that the UK is also leaning in this direction with revisions to their copyright law currently being debated. One of the big points of discussion on Twitter following this was people pointing out that OpenAI had invoked national security concerns to justify a copyright exception, but whether you think that justifies a copyright exception or not, which reasonable people can disagree on,

I do think that we have to assume that China will 100% not care about copyright when it comes to enabling its companies to train models on copyrighted materials. And so if we decide to care about that, if we decide that the rights of copyright holders are important enough to stop training on their materials, that is effectively accepting that's an advantage that the U.S. is going to let China have. Again, I'm not drawing judgment one way or another. I'm just saying that is implicitly a part of this conversation.

On infrastructure, OpenAI proposed a wide range of government investment, including open-sourcing government datasets and a range of other measures large and small. Their biggest ask was a massive build-out of power transmission infrastructure that equals the national highway build-out of the 1950s in scope. Finally, OpenAI suggested a big push to drive AI adoption within government departments. They included a range of policy changes that would make the process easier, but the bottom line was encouraging public-private partnerships to update the government's tech stack.

Their overarching view on strategy was that, quote, the U.S. needs to pursue an active international economic policy to advocate for American values and support AI innovation internationally. For too long, they said, AI policymaking has paid disproportionate attention to the risks, often ignoring the costs that misguided regulation can have on innovation, national competitiveness, and scientific leadership, a dynamic that is beginning to shift under the new administration.

Google asked for a similar fair use exemption from copyright infringement in training data. They didn't go so far as to label copyright enforcement a national security risk, but did claim that fair use could be allowed, quote, without significantly impacting rights holders. Their argument was largely commercial, the point that negotiation with data rights holders are lengthy and highly unpredictable. The company also called for a winding back of the Biden-era export controls, calling for the replacement to be, quote, carefully crafted to support legitimate market access for U.S. businesses and

while targeting the most pertinent risks. Again, they pointed out that placing additional burdens on companies is likely to put them at a disadvantage in the global market. Also present was encouragement for AI adoption in government agencies, along with uniform federal laws and government spending on infrastructure. Interestingly, Google seemed to take issue with the Trump administration's push to cut the budget for foundational R&D. They claimed that, quote, long-term sustained investment in AI research had given the U.S. a, quote, crucial advantage in the race for global AI leadership.

Google instead called for the government to significantly bolster these efforts.

On safety, Google seemed to be calling for a similar liability shield that was handed to internet companies in the 90s. They called for a clear delineation between the responsibilities of model providers and users, noting, "...the actor with the most control over a specific step in the AI lifecycle should bear responsibility and any associated liability for that step. In many instances, the original developer of an AI model has little to no visibility or control over how it's being used by a deployer and may not interact with end users."

Google also criticized safety disclosure requirements in the EU as overly broad. They opposed any transparency rules that would require, quote, Moving on to Anthropic, their proposal had a very different set of priorities, as seemingly befits the safety-focused company.

Their central premise, one that CEO Dario Amadei has been making in interviews recently, is that AGI is coming and the government only has a couple of years to prepare. Their number one recommendation was establishing national security testing for model capabilities. Anthropic proposed tests of both domestic and foreign models for national security implications. Their proposal on export controls was to ramp them up significantly. In addition to chip controls, they called for requiring government-to-government agreements for countries hosting large chip deployments,

and reducing no-license-required thresholds. Anthropic also suggested bringing AI labs into the intelligence structure. They called for classified communication channels with intelligence agencies, expedited security clearance for AI professionals, and, quote, the development of next-generation security standards for AI infrastructure. Joining the other labs, Anthropic called for scaling energy infrastructure and accelerating government AI adoption. Their final recommendation leaned on recent discussions by Amodei, suggesting that the government needs to start preparing for the economic impact of AI.

They write, to ensure AI benefits are broadly shared throughout society, we recommend modernizing mechanisms for economic data collection, like the Census Bureau surveys, and preparing for potential large-scale changes to the economy. Now, if you've been paying any attention at all to what Amadei has been saying lately, you'll understand exactly where this set of priorities is coming from. Dario noted that China is known for, quote, large-scale industrial espionage.

He commented that anthropic and all AI companies are almost certainly being targeted, adding, Many of these algorithmic secrets, and there are $100 million worth of secrets, that are a few lines of code. And you know I'm sure there are folks trying to steal them, and they may be succeeding. This is not unfounded paranoia, as numerous government officials have warned of a sharp uptick on foreign spies trying to infiltrate tech companies over the past year. Some have even made the point that during the era of nuclear science, the entire field was classified information.

But in the AI era, methods of building advanced technology are openly discussed at Silicon Valley house parties. So what is the sum total of all of this? A couple things that stood out to me. The general tenor of the submissions is very accelerationist. Even Anthropic's more alarmist proposal is not asking for AI progress to be slowed down. Instead, it's basically asking for an equal commitment to accelerate certain safety aspects at the same time. And it's very clear that the entire US industry wants to move faster to beat China to develop powerful AI.

In terms of the approaches, there's a mix of deregulation, pro-business, and government subsidy being proposed. But the biggest point of consensus is that everyone wants restrictions to be lifted and a well-defined policy structure that allows the labs to accelerate. If you've been tracking the changes in attitude, none of this is particularly surprising. It's just very notable how much the tone has shifted from a year, especially two years ago. There were basically zero concessions here to generalize safety concerns.

And to the extent that there were safety proposals, they were clearly defined and came with specific recommendations to mitigate. TLDR, if you think things are moving fast now, you ain't seen nothing yet. Now, of course, what the Trump administration does with all of this remains to be seen, but we will, of course, cover that as it comes out. For now, that is going to do it for today's AI Daily Brief. Appreciate you listening, as always. And until next time, peace.