We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

#199 - OpenAI's 03-mini, Gemini Thinking, Deep Research, s1

2025/2/12

Last Week in AI

This chapter compares OpenAI's O3 Mini and Google's Gemini 2.0 reasoning models, highlighting their strengths and weaknesses in benchmarks like Frontier Math. The discussion includes the release of O3 Mini's thought process and Google's focus on cheaper inference.

OpenAI's O3 Mini outperforms previous models in reasoning benchmarks.
Google's Gemini 2.0 Flash aims for fast and cheap inference.
Competition in reasoning models is fierce, with a focus on cost-effective inference.

Shownotes Transcript

Our 199th episode with a summary and discussion of last week's big AI news! Recorded on 02/09/2025

Join our brand new Discord here!) https://discord.gg/nTyezGSKwP

Hosted by Andrey Kurenkov) and Jeremie Harris). Feel free to email us your questions and feedback at [email protected] )and/or [email protected])

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/).

In this episode:

OpenAI's deep research feature capability launched, allowing models to generate detailed reports after prolonged inference periods, competing directly with Google's Gemini 2.0 reasoning models.
France and UAE jointly announce plans to build a massive AI data center in France, aiming to become a competitive player within the AI infrastructure landscape.
Mistral introduces a mobile app, broadening its consumer AI lineup amidst market skepticism about its ability to compete against larger firms like OpenAI and Google.
Anthropic unveils 'Constitutional Classifiers,' a method showing strong defenses against universal jailbreaks; they also launched a $20K challenge to find weaknesses.

Timestamps + Links:

(00:00:00) Intro / Banter

(00:02:27) News Preview

(00:03:28) Response to listener comments

Tools & Apps

(00:08:01) OpenAI now reveals more of its o3-mini model’s thought process)

(00:16:03) Google’s Gemini app adds access to ‘thinking’ AI models)

(00:21:04) OpenAI Unveils A.I. Tool That Can Do Research Online)

(00:31:09) Mistral releases its AI assistant on iOS and Android)

(00:36:17) AI music startup Riffusion launches its service in public beta)

(00:39:11) Pikadditions by Pika Labs lets users seamlessly insert objects into videos)

Applications & Business

(00:41:19) Softbank set to invest $40 billion in OpenAI at $260 billion valuation, sources say)

(00:47:36) UAE to invest billions in France AI data centre)

(00:50:34) Report: Ilya Sutskever’s startup in talks to fundraise at roughly $20B valuation)

(00:52:03) ASML to Ship First Second-Gen High-NA EUV Machine in the Coming Months, Aiming for 2026 Production)

(00:54:38) NVIDIA’s GB200 NVL 72 Shipments Not Under Threat From DeepSeek As Hyperscalers Maintain CapEx; Meanwhile, Trump Tariffs Play Havoc With TSMC’s Pricing Strategy)

Projects & Open Source

(00:56:49) The Allen Institute for AI (AI2) Releases Tülu 3 405B: Scaling Open-Weight...)

(01:00:06) SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model)

(01:03:56) PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models)

(01:08:26) OpenEuroLLM: Europe’s New Initiative for Open-Source AI Development)

Research & Advancements

(01:10:34) LIMO: Less is More for Reasoning)

(01:16:39) s1: Simple test-time scaling)

(01:19:17) ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning)

(01:23:55) Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch)

Policy & Safety

(01:26:50) US sets AI safety aside in favor of 'AI dominance')

(01:29:39) Almost Surely Safe Alignment of Large Language Models at Inference-Time)

(01:32:02) Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming)

(01:33:16) Anthropic offers $20,000 to whoever can jailbreak its new AI safety system)

#199 - OpenAI's 03-mini, Gemini Thinking, Deep Research, s1 01:37:46 Share

Last Week in AI

Shownotes Transcript

#199 - OpenAI's 03-mini, Gemini Thinking, Deep Research, s1