We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

2025.01.31 | GuardReasoner提升LLM安全，MedXpertQA挑战医疗AI推理。

2025/1/31

HuggingFace 每日AI论文速递

本期的 8 篇论文如下：

[00:25] 🛡 GuardReasoner: Towards Reasoning-based LLM Safeguards（GuardReasoner：面向基于推理的LLM安全防护）

[01:04] 🩺 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding（MedXpertQA：专家级医疗推理与理解基准测试）

[01:58] 🧠 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs（思维四处游走：关于o1类LLMs的浅思现象）

[02:40] 🌐 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch（带有重叠通信的流式DiLoCo：迈向分布式免费午餐）

[03:20] 🌍 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding（PhysBench：评估与增强视觉-语言模型在物理世界理解中的表现）

[04:09] 🤖 WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training（WILDCHAT-50M：深入探讨合成数据在训练后阶段的作用）

[05:04] 🛡 o3-mini vs DeepSeek-R1: Which One is Safer?（o3-mini 与 DeepSeek-R1：哪个更安全？）

[05:41] 🤔 Large Language Models Think Too Fast To Explore Effectively（大语言模型思考过快导致探索效果不佳）【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递