We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

#212 - o3 pro, Cursor 1.0, ProRL, Midjourney Sued

2025/6/17

Last Week in AI

AI Deep Dive AI Chapters Transcript

People

Andrey Kurenkov

Jeremie Harris

Topics

Andrey Kurenkov: OpenAI发布了新的推理模型O3 Pro，性能与O1相当，价格大幅降低80%。同时，开源AI模型的发布被推迟到夏季晚些时候。O3 Pro在基准测试中表现出色，优于之前的模型。 Jeremie Harris: O3 Pro在各项指标上都优于人类，包括个人写作、计算机编程和数据分析。OpenAI使用四次尝试都正确回答问题的评估方法，以确保代理在更高风险场景中表现稳定。

Deep Dive

Chapters

OpenAI released O3 Pro, a significantly improved reasoning model for ChatGPT, boasting better performance than previous versions and a substantial price reduction. Human testers overwhelmingly preferred O3 Pro across various tasks.

O3 Pro surpasses O1 and O3 Medium in performance benchmarks.
Price of O3 model decreased by 80%.
O3 Pro preferred to O3 by human testers 64% of the time across various tasks.

Shownotes Transcript

Our 212th episode with a summary and discussion of last week's big AI news! Recorded on 06/33/2025

Hosted by Andrey Kurenkov) and Jeremie Harris). Feel free to email us your questions and feedback at [email protected] )and/or [email protected])

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/).

In this episode:

OpenAI introduces O3 PRO for ChatGPT, highlighting significant improvements in performance and cost-efficiency.
Anthropic sees an influx of talent from OpenAI and DeepMind, with significantly higher retention rates and competitive advantages in AI capabilities.
New research indicates that reinforcing negative responses in LLMs significantly improves performance across all metrics, highlighting novel approaches in reinforcement learning.
A security flaw in Microsoft Copilot demonstrates the growing risk of AI agents being hacked, emphasizing the need for robust protection against zero-click attacks.

Timestamps + Links:

(00:00:11) Intro / Banter

(00:01:31) News Preview

(00:02:46) Response to Listener Reviews

Tools & Apps

(00:04:48) OpenAI adds o3 Pro to ChatGPT and drops o3 price by 80 per cent, but open-source AI is delayed)

(00:09:10) Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents)

(00:13:07) Mistral releases a pair of AI reasoning models)

(00:16:18) Elevenlabs' Eleven v3 lets AI voices whisper, laugh and express emotions naturally)

(00:19:00) ByteDance's Seedance 1.0 is trading blows with Google's Veo 3)

(00:22:42) Google Reveals $20 AI Pro Plan With Veo 3 Fast Video Generator For Budget Creators)

Applications & Business

(00:25:42) OpenAI and DeepMind are losing engineers to Anthropic in a one-sided talent war)

(00:34:32) OpenAI slams court order to save all ChatGPT logs, including deleted chats)

(00:37:24) Nvidia’s Biggest Chinese Rival Huawei Struggles to Win at Home)

(00:43:06) Huawei Expected to Break Semiconductor Barriers with Development of High-End 3nm GAA Chips; Tape-Out by 2026)

(00:45:21) TSMC’s 1.4nm Process, Also Called Angstrom, Will Make Even The Most Lucrative Clients Think Twice When Placing Orders, With An Estimate Claiming That Each Wafer Will Cost $45,000)

(00:47:43) Mistral AI Launches Mistral Compute To Replace Cloud Providers from US, China)

Projects & Open Source

(00:51:26) ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models)

Research & Advancements

(00:57:27) Kinetics: Rethinking Test-Time Scaling Laws)

(01:05:12) The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning)

(01:10:45) Predicting Empirical AI Research Outcomes with Language Models)

(01:15:02) EXP-Bench: Can AI Conduct AI Research Experiments?)

Policy & Safety

(01:20:07) Large Language Models Often Know When They Are Being Evaluated)

(01:24:56) Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence)

(01:31:16) Exclusive: New Microsoft Copilot flaw signals broader risk of AI agents being hacked—‘I would be terrified’)

(01:35:01) Claude Gov Models for U.S. National Security Customers)

Synthetic Media & Art

(01:37:32) Disney And NBCUniversal Sue AI Company Midjourney For Copyright Infringement)

(01:40:39) AMC Networks is teaming up with AI company Runway)

#212 - o3 pro, Cursor 1.0, ProRL, Midjourney Sued 01:46:08 Share

Last Week in AI

Deep Dive

Shownotes Transcript

#212 - o3 pro, Cursor 1.0, ProRL, Midjourney Sued