本期的 15 篇论文如下:
[00:23] 🧠 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models(ProRL:延长的强化学习拓展大型语言模型的推理边界)
[01:01] 🧠 AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time(AlphaOne:测试时驱动大模型进行快慢思考的推理框架)
[01:42] 🤔 Time Blindness: Why Video-Language Models Can't See What Humans Can?(时间盲区:为何视频-语言模型无法像人类一样观察?)
[02:32] 🖼 Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation(不要只看一次:迈向具有选择性视觉重访的多模态交互推理)
[03:13] 📊 Large Language Models for Data Synthesis(用于数据合成的大型语言模型)
[03:59] 🖼 ViStoryBench: Comprehensive Benchmark Suite for Story Visualization(ViStoryBench:故事可视化综合基准测试套件)
[04:39] 🧪 HardTests: Synthesizing High-Quality Test Cases for LLM Coding(HardTests:为大型语言模型代码生成合成高质量测试用例)
[05:21] 🤖 Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents(开放验证码世界:一个用于测试和评估多模态大型语言模型代理的综合性Web平台)
[05:59] 🤔 Vision Language Models are Biased(视觉语言模型存在偏见)
[06:41] 🦾 CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects(CoDA:用于铰接物体全身操控的协同扩散噪声优化)
[07:20] 🚀 CLaSp: In-Context Layer Skip for Self-Speculative Decoding(CLaSp:用于自推测解码的上下文层跳跃)
[08:03] 📐 UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation(UniGeo:驾驭视频扩散模型以实现统一的、一致的几何估计)
[08:44] 🤔 MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs(MetaFaith:大型语言模型中忠实的自然语言不确定性表达)
[09:28] ✍ EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering(EasyText:用于多语言文本渲染的可控扩散Transformer)
[10:11] 🎧 Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models(Fork-Merge解码:增强视听大型语言模型中的多模态理解)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递