We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 2025.07.02 | 多模态推理提升;双向嵌入优化

2025.07.02 | 多模态推理提升;双向嵌入优化

2025/7/2
logo of podcast HuggingFace 每日AI论文速递

HuggingFace 每日AI论文速递

AI Chapters
Chapters

Shownotes Transcript

本期的 12 篇论文如下:

[00:23] 💡 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning(GLM-4.1V-Thinking:基于可扩展强化学习的通用多模态推理)

[01:00] 🖼 MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings(MoCa:模态感知持续预训练提升双向多模态嵌入效果)

[01:35] 🔬 SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks(SciArena:科学文献任务中基础模型的开放评估平台)

[02:19] 🤔 Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning(数学推理能力是否能提升通用大语言模型的能力?理解大语言模型推理的迁移性)

[02:59] 🎬 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation(径向注意力:用于长视频生成的具有能量衰减的O(n log n)稀疏注意力机制)

[03:37] 🤖 DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation(DiffuCoder:理解并改进用于代码生成的掩码扩散模型)

[04:19] 🧠 HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context(HumanOmniV2:基于上下文理解到全模态推理)

[04:53] 🧠 Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact(超越Token:从脑启发智能到通用人工智能的认知基础及其社会影响)

[05:30] 💡 Data Efficacy for Language Model Training(语言模型训练中的数据效能)

[06:05] 🎬 FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion(FreeLong++:通过多频段频谱融合实现免训练长视频生成)

[06:40] 🖼 IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering(IR3D-Bench:评估视觉-语言模型作为智能体进行逆向渲染的场景理解能力)

[07:28] 🛡 Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images(Peccavi:一种针对AI生成图像的视觉释义攻击安全且无失真的图像水印技术) 【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递