We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode #212 - o3 pro, Cursor 1.0, ProRL, Midjourney Sued

#212 - o3 pro, Cursor 1.0, ProRL, Midjourney Sued

2025/6/17
logo of podcast Last Week in AI

Last Week in AI

AI Deep Dive AI Chapters Transcript
People
A
Andrey Kurenkov
J
Jeremie Harris
Topics
Andrey Kurenkov: OpenAI发布了新的推理模型O3 Pro,性能与O1相当,价格大幅降低80%。同时,开源AI模型的发布被推迟到夏季晚些时候。O3 Pro在基准测试中表现出色,优于之前的模型。 Jeremie Harris: O3 Pro在各项指标上都优于人类,包括个人写作、计算机编程和数据分析。OpenAI使用四次尝试都正确回答问题的评估方法,以确保代理在更高风险场景中表现稳定。

Deep Dive

Chapters
OpenAI released O3 Pro, a significantly improved reasoning model for ChatGPT, boasting better performance than previous versions and a substantial price reduction. Human testers overwhelmingly preferred O3 Pro across various tasks.
  • O3 Pro surpasses O1 and O3 Medium in performance benchmarks.
  • Price of O3 model decreased by 80%.
  • O3 Pro preferred to O3 by human testers 64% of the time across various tasks.

Shownotes Transcript

Our 212th episode with a summary and discussion of last week's big AI news! Recorded on 06/33/2025

Hosted by Andrey Kurenkov) and Jeremie Harris). Feel free to email us your questions and feedback at [email protected] )and/or [email protected])

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/).

In this episode:

  • OpenAI introduces O3 PRO for ChatGPT, highlighting significant improvements in performance and cost-efficiency.

  • Anthropic sees an influx of talent from OpenAI and DeepMind, with significantly higher retention rates and competitive advantages in AI capabilities.

  • New research indicates that reinforcing negative responses in LLMs significantly improves performance across all metrics, highlighting novel approaches in reinforcement learning.

  • A security flaw in Microsoft Copilot demonstrates the growing risk of AI agents being hacked, emphasizing the need for robust protection against zero-click attacks.

Timestamps + Links:

(00:00:11) Intro / Banter

(00:01:31) News Preview

(00:02:46) Response to Listener Reviews

Tools & Apps

(00:04:48) OpenAI adds o3 Pro to ChatGPT and drops o3 price by 80 per cent, but open-source AI is delayed)

(00:09:10) Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents)

(00:13:07) Mistral releases a pair of AI reasoning models)

(00:16:18) Elevenlabs' Eleven v3 lets AI voices whisper, laugh and express emotions naturally)

(00:19:00) ByteDance's Seedance 1.0 is trading blows with Google's Veo 3)

(00:22:42) Google Reveals $20 AI Pro Plan With Veo 3 Fast Video Generator For Budget Creators)

Applications & Business

(00:25:42) OpenAI and DeepMind are losing engineers to Anthropic in a one-sided talent war)

(00:34:32) OpenAI slams court order to save all ChatGPT logs, including deleted chats)

(00:37:24) Nvidia’s Biggest Chinese Rival Huawei Struggles to Win at Home)

(00:43:06) Huawei Expected to Break Semiconductor Barriers with Development of High-End 3nm GAA Chips; Tape-Out by 2026)

(00:45:21) TSMC’s 1.4nm Process, Also Called Angstrom, Will Make Even The Most Lucrative Clients Think Twice When Placing Orders, With An Estimate Claiming That Each Wafer Will Cost $45,000)

(00:47:43) Mistral AI Launches Mistral Compute To Replace Cloud Providers from US, China)

Projects & Open Source

(00:51:26) ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models)

Research & Advancements

(00:57:27) Kinetics: Rethinking Test-Time Scaling Laws)

(01:05:12) The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning)

(01:10:45) Predicting Empirical AI Research Outcomes with Language Models)

(01:15:02) EXP-Bench: Can AI Conduct AI Research Experiments?)

Policy & Safety

(01:20:07) Large Language Models Often Know When They Are Being Evaluated)

(01:24:56) Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence)

(01:31:16) Exclusive: New Microsoft Copilot flaw signals broader risk of AI agents being hacked—‘I would be terrified’)

(01:35:01) Claude Gov Models for U.S. National Security Customers)

Synthetic Media & Art

(01:37:32) Disney And NBCUniversal Sue AI Company Midjourney For Copyright Infringement)

(01:40:39) AMC Networks is teaming up with AI company Runway)