We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

#206 - Llama 4, Nova Act, xAI buys X, PaperBench

2025/4/9

Last Week in AI

Andrey Kurenkov和Jeremie Harris：Meta发布的Llama 4大型语言模型系列在性能和发布时机上引发了争议。该系列模型针对不同的配置和应用拥有高达2万亿个参数，包括Llama 4 Scout、Llama 4 Maverick和Llama 4 Behemoth等不同版本。虽然从工程角度来看，Llama 4在训练效率和多模态融合方面取得了一些进展，但其实际性能却褒贬不一，一些人认为其表现不如预期。发布的模型与用于基准测试的模型存在差异，引发了人们对Meta是否操纵基准测试结果的质疑。此外，Llama 4模型的庞大规模也使其难以在普通硬件上运行，这与开源理念存在冲突。我们对Llama 4的发布存在一些疑问，包括其性能是否如宣传的那样出色，以及Meta是否在发布时机上有所仓促。虽然Llama 4在工程方面取得了一些进展，但其实际性能却令人失望。Meta似乎急于发布该模型，这可能是因为其他公司即将发布类似的模型。

Deep Dive

Shownotes Transcript

Our 206th episode with a summary and discussion of last week's big AI news! Recorded on 04/07/2025

Try out the Astrocade demo here)! https://www.astrocade.com/

Hosted by Andrey Kurenkov) and Jeremie Harris). Feel free to email us your questions and feedback at [email protected] )and/or [email protected])

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/).

Join our Discord here!) https://discord.gg/nTyezGSKwP

In this episode:

Meta releases LlAMA-4, a series of advanced large language models, sparking debate on performance and release timing, with models featuring up to 2 trillion parameters for different configurations and applications.
Amazon's AGI Lab debuts NOVA Act, an AI agent for web browser control, boasting competitive benchmarking against OpenAI's and Anthropic's best agents.
OpenAI's image generation capabilities and ongoing financing developments, notably a $40 billion funding round led by SoftBank, highlight significant advancements and strategic shifts in the tech giant’s operations.

Timestamps + Links:

(00:00:00) Intro / Banter

Tools & Apps

(00:01:46) Meta releases Llama 4, a new crop of flagship AI models)

(00:13:55) Amazon unveils Nova Act, an AI agent that can control a web browser)

(00:17:06) Alibaba Preparing for Flagship AI Model Release as Soon as April)

(00:17:59) Runway releases an impressive new video-generating AI model)

(00:19:10) Adobe launches Premiere Pro’s generative AI video extender)

(00:20:54) OpenAI prepares reasoning slider and memory update for ChatGPT users)

Applications & Business

(00:21:28) Nvidia H20 Chips: $16 Billion Orders from ByteDance, Alibaba, and Tencent)

(00:24:45) Elon Musk sells X for $33 billion to his own AI startup company xAI)

(00:28:00) SoftBank dethroned Microsoft as OpenAI's largest investor, pushing the ChatGPT maker's market cap to $300 billion — but reportedly buried itself in debt)

(00:30:48) DeepMind is holding back release of AI research to give Google an edge)

(00:34:06) SMIC Is Rumored To Complete 5nm Chip Development By 2025; Costs Could Be Up To 50 Percent Higher Than TSMC’s Version Due To The Use Of Older-Generation Equipment)

(00:36:04) Google-backed Isomorphic Labs raises $600m to advance AI drug discovery)

Research & Advancements

(00:38:03) PaperBench: Evaluating AI's Ability to Replicate AI Research)

(00:43:50) Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains)

(00:48:39) Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead)

(00:54:34) Overtrained Language Models Are Harder to Fine-Tune)

Policy & Safety

(00:58:28) Taking a responsible path to AGI)

(01:02:32) This A.I. Forecast Predicts Storms Ahead)

(01:06:24) The Secrets and Misdirection Behind Sam Altman’s Firing From OpenAI)

OpenAI's new image generation capabilities represent significant advancements in AI tools, showcasing impressive benchmarks and multimodal functionalities.
OpenAI is finalizing a historic $40 billion funding round led by SoftBank, and Sam Altman shifts focus to technical direction while COO Brad Lightcap takes on more operational responsibilities.,
Anthropic unveils groundbreaking interpretability research, introducing cross-layer tracers and showcasing deep insights into model reasoning through applications on Claude 3.5.
New challenging benchmarks such as ARC AGI 2 and complex Sudoku variations aim to push the boundaries of reasoning and problem-solving capabilities in AI models.

#206 - Llama 4, Nova Act, xAI buys X, PaperBench 01:13:44 Share

Last Week in AI

Deep Dive

Shownotes Transcript

#206 - Llama 4, Nova Act, xAI buys X, PaperBench