We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 2025.01.31 | GuardReasoner提升LLM安全,MedXpertQA挑战医疗AI推理。

2025.01.31 | GuardReasoner提升LLM安全,MedXpertQA挑战医疗AI推理。

2025/1/31
logo of podcast HuggingFace 每日AI论文速递

HuggingFace 每日AI论文速递

Shownotes Transcript

本期的 8 篇论文如下:

[00:25] 🛡 GuardReasoner: Towards Reasoning-based LLM Safeguards(GuardReasoner:面向基于推理的LLM安全防护)

[01:04] 🩺 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding(MedXpertQA:专家级医疗推理与理解基准测试)

[01:58] 🧠 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs(思维四处游走:关于o1类LLMs的浅思现象)

[02:40] 🌐 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch(带有重叠通信的流式DiLoCo:迈向分布式免费午餐)

[03:20] 🌍 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding(PhysBench:评估与增强视觉-语言模型在物理世界理解中的表现)

[04:09] 🤖 WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training(WILDCHAT-50M:深入探讨合成数据在训练后阶段的作用)

[05:04] 🛡 o3-mini vs DeepSeek-R1: Which One is Safer?(o3-mini 与 DeepSeek-R1:哪个更安全?)

[05:41] 🤔 Large Language Models Think Too Fast To Explore Effectively(大语言模型思考过快导致探索效果不佳) 【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递