We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode #206 - Llama 4, Nova Act, xAI buys X, PaperBench

#206 - Llama 4, Nova Act, xAI buys X, PaperBench

2025/4/9
logo of podcast Last Week in AI

Last Week in AI

AI Deep Dive Transcript
Topics
Andrey Kurenkov和Jeremie Harris:Meta发布的Llama 4大型语言模型系列在性能和发布时机上引发了争议。该系列模型针对不同的配置和应用拥有高达2万亿个参数,包括Llama 4 Scout、Llama 4 Maverick和Llama 4 Behemoth等不同版本。虽然从工程角度来看,Llama 4在训练效率和多模态融合方面取得了一些进展,但其实际性能却褒贬不一,一些人认为其表现不如预期。发布的模型与用于基准测试的模型存在差异,引发了人们对Meta是否操纵基准测试结果的质疑。此外,Llama 4模型的庞大规模也使其难以在普通硬件上运行,这与开源理念存在冲突。 我们对Llama 4的发布存在一些疑问,包括其性能是否如宣传的那样出色,以及Meta是否在发布时机上有所仓促。虽然Llama 4在工程方面取得了一些进展,但其实际性能却令人失望。Meta似乎急于发布该模型,这可能是因为其他公司即将发布类似的模型。

Deep Dive

Shownotes Transcript

Our 206th episode with a summary and discussion of last week's big AI news! Recorded on 04/07/2025

Try out the Astrocade demo here)! https://www.astrocade.com/

Hosted by Andrey Kurenkov) and Jeremie Harris). Feel free to email us your questions and feedback at [email protected] )and/or [email protected])

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/).

Join our Discord here!) https://discord.gg/nTyezGSKwP

In this episode:

  • Meta releases LlAMA-4, a series of advanced large language models, sparking debate on performance and release timing, with models featuring up to 2 trillion parameters for different configurations and applications.

  • Amazon's AGI Lab debuts NOVA Act, an AI agent for web browser control, boasting competitive benchmarking against OpenAI's and Anthropic's best agents.

  • OpenAI's image generation capabilities and ongoing financing developments, notably a $40 billion funding round led by SoftBank, highlight significant advancements and strategic shifts in the tech giant’s operations.

Timestamps + Links:

(00:00:00) Intro / Banter

Tools & Apps

(00:01:46) Meta releases Llama 4, a new crop of flagship AI models)

(00:13:55) Amazon unveils Nova Act, an AI agent that can control a web browser)

(00:17:06) Alibaba Preparing for Flagship AI Model Release as Soon as April)

(00:17:59) Runway releases an impressive new video-generating AI model)

(00:19:10) Adobe launches Premiere Pro’s generative AI video extender)

(00:20:54) OpenAI prepares reasoning slider and memory update for ChatGPT users)

Applications & Business

(00:21:28) Nvidia H20 Chips: $16 Billion Orders from ByteDance, Alibaba, and Tencent)

(00:24:45) Elon Musk sells X for $33 billion to his own AI startup company xAI)

(00:28:00) SoftBank dethroned Microsoft as OpenAI's largest investor, pushing the ChatGPT maker's market cap to $300 billion — but reportedly buried itself in debt)

(00:30:48) DeepMind is holding back release of AI research to give Google an edge)

(00:34:06) SMIC Is Rumored To Complete 5nm Chip Development By 2025; Costs Could Be Up To 50 Percent Higher Than TSMC’s Version Due To The Use Of Older-Generation Equipment)

(00:36:04) Google-backed Isomorphic Labs raises $600m to advance AI drug discovery)

Research & Advancements

(00:38:03) PaperBench: Evaluating AI's Ability to Replicate AI Research)

(00:43:50) Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains)

(00:48:39) Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead)

(00:54:34) Overtrained Language Models Are Harder to Fine-Tune)

Policy & Safety

(00:58:28) Taking a responsible path to AGI)

(01:02:32) This A.I. Forecast Predicts Storms Ahead)

(01:06:24) The Secrets and Misdirection Behind Sam Altman’s Firing From OpenAI)

  • OpenAI's new image generation capabilities represent significant advancements in AI tools, showcasing impressive benchmarks and multimodal functionalities.

  • OpenAI is finalizing a historic $40 billion funding round led by SoftBank, and Sam Altman shifts focus to technical direction while COO Brad Lightcap takes on more operational responsibilities.,

  • Anthropic unveils groundbreaking interpretability research, introducing cross-layer tracers and showcasing deep insights into model reasoning through applications on Claude 3.5.

  • New challenging benchmarks such as ARC AGI 2 and complex Sudoku variations aim to push the boundaries of reasoning and problem-solving capabilities in AI models.