本期的 8 篇论文如下:
[00:25] 🛡 GuardReasoner: Towards Reasoning-based LLM Safeguards(GuardReasoner:面向基于推理的LLM安全防护)
[01:04] 🩺 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding(MedXpertQA:专家级医疗推理与理解基准测试)
[01:58] 🧠 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs(思维四处游走:关于o1类LLMs的浅思现象)
[02:40] 🌐 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch(带有重叠通信的流式DiLoCo:迈向分布式免费午餐)
[03:20] 🌍 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding(PhysBench:评估与增强视觉-语言模型在物理世界理解中的表现)
[04:09] 🤖 WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training(WILDCHAT-50M:深入探讨合成数据在训练后阶段的作用)
[05:04] 🛡 o3-mini vs DeepSeek-R1: Which One is Safer?(o3-mini 与 DeepSeek-R1:哪个更安全?)
[05:41] 🤔 Large Language Models Think Too Fast To Explore Effectively(大语言模型思考过快导致探索效果不佳)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递