We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

2025.02.18 | 稀疏注意力提升效率，机器人起身策略优化。

2025/2/18

HuggingFace 每日AI论文速递

AI Chapters

Chapters

Shownotes Transcript

No transcript made for this episode yet, you may request it for free.

2025.02.18 | 稀疏注意力提升效率，机器人起身策略优化。

HuggingFace 每日AI论文速递

⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention（原生稀疏注意力：硬件对齐与原生可训练的稀疏注意力）

🤖 Learning Getting-Up Policies for Real-World Humanoid Robots（学习真实世界人形机器人起身策略）

🧠 ReLearn: Unlearning via Learning for Large Language Models（ReLearn：通过学习实现大型语言模型的遗忘）

💻 SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?（SWE-Lancer：前沿大语言模型能否从真实世界的自由软件工程中赚取100万美元？）

🌐 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation（赫尔墨斯流：无缝衔接多模态理解和生成）

🧠 How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training（大型语言模型如何获取新知识？知识电路视角下的持续预训练）

🤖 SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors（SURGE：关于大型语言模型作为通用代理代码执行器的潜力）

🔧 Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening（扩散锐化：利用去噪轨迹锐化优化扩散模型微调）

🧠 I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models（我思故我扩散：在扩散模型中实现多模态上下文推理）

🔧 SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL（SAFE-SQL：基于细粒度示例选择的自增强上下文学习用于文本到SQL转换）

🧠 CRANE: Reasoning with constrained LLM generation（CRANE：受限LLM生成的推理）

🧠 Intuitive physics understanding emerges from self-supervised pretraining on natural videos（直觉物理理解从自然视频的自监督预训练中涌现）

🐦 Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest（杜鹃：在大型语言模型的巢中孵化出的信息抽取搭便车者）

🧠 Dyve: Thinking Fast and Slow for Dynamic Process Verification（Dyve：动态过程验证中的快思与慢想）

🧠 PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning（物理推理：基于物理推理的综合基准）

🤖 System Message Generation for User Preferences using Open-Source Models（基于开源模型的用户偏好系统消息生成）

🎥 video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model（视频-SALMONN-o1：推理增强的音视频大型语言模型）

🧠 Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity（构建一个在数据稀缺情况下比GPT-4o好64%的证明导向程序员）

🤖 Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning（记忆、基准与机器人：一种用于强化学习解决复杂任务的基准）

🤖 MagicArticulate: Make Your 3D Models Articulation-Ready（魔法清晰：让你的3D模型准备好关节动画）

🤖 Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems（结构化交流，层次化行动：LLM多智能体系统的协作框架）

🧠 One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs（一个示例展示，多个概念知晓！数学大语言模型中的反例驱动概念推理）

🤖 Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model（单一模型能否同时掌握多轮对话与工具使用？CALM：一个统一的对话代理语言模型）

🚀 Better Embeddings with Coupled Adam（结合Adam优化器的更好嵌入）

🧐 Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking（展示工作：事实核查员对可解释自动化事实核查的需求）

🧪 Towards Data-Efficient Pretraining for Atomic Property Prediction（面向原子性质预测的数据高效预训练）

🌀 The Mirage of Model Editing: Revisiting Evaluation in the Wild（模型编辑的幻象：重新审视实际应用中的评估）

🧮 Large Language Models and Mathematical Reasoning Failures（大型语言模型与数学推理失败）

📊 Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance（语言复杂度测量作为评估LLM性能的噪声零样本代理）

Shownotes Transcript

2025.02.18 | 稀疏注意力提升效率，机器人起身策略优化。 21:21 Share

HuggingFace 每日AI论文速递

⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention（原生稀疏注意力：硬件对齐与原生可训练的稀疏注意力）

🤖 Learning Getting-Up Policies for Real-World Humanoid Robots（学习真实世界人形机器人起身策略）

🧠 ReLearn: Unlearning via Learning for Large Language Models（ReLearn：通过学习实现大型语言模型的遗忘）

💻 SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?（SWE-Lancer：前沿大语言模型能否从真实世界的自由软件工程中赚取100万美元？）

🌐 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation（赫尔墨斯流：无缝衔接多模态理解和生成）

🧠 How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training（大型语言模型如何获取新知识？知识电路视角下的持续预训练）

🤖 SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors（SURGE：关于大型语言模型作为通用代理代码执行器的潜力）

🔧 Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening（扩散锐化：利用去噪轨迹锐化优化扩散模型微调）

🧠 I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models（我思故我扩散：在扩散模型中实现多模态上下文推理）

🔧 SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL（SAFE-SQL：基于细粒度示例选择的自增强上下文学习用于文本到SQL转换）

🧠 CRANE: Reasoning with constrained LLM generation（CRANE：受限LLM生成的推理）

🧠 Intuitive physics understanding emerges from self-supervised pretraining on natural videos（直觉物理理解从自然视频的自监督预训练中涌现）

🐦 Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest（杜鹃：在大型语言模型的巢中孵化出的信息抽取搭便车者）

🧠 Dyve: Thinking Fast and Slow for Dynamic Process Verification（Dyve：动态过程验证中的快思与慢想）

🧠 PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning（物理推理：基于物理推理的综合基准）

🤖 System Message Generation for User Preferences using Open-Source Models（基于开源模型的用户偏好系统消息生成）

🎥 video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model（视频-SALMONN-o1：推理增强的音视频大型语言模型）

🧠 Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity（构建一个在数据稀缺情况下比GPT-4o好64%的证明导向程序员）

🤖 Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning（记忆、基准与机器人：一种用于强化学习解决复杂任务的基准）

🤖 MagicArticulate: Make Your 3D Models Articulation-Ready（魔法清晰：让你的3D模型准备好关节动画）

🤖 Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems（结构化交流，层次化行动：LLM多智能体系统的协作框架）

🧠 One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs（一个示例展示，多个概念知晓！数学大语言模型中的反例驱动概念推理）

🤖 Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model（单一模型能否同时掌握多轮对话与工具使用？CALM：一个统一的对话代理语言模型）

🚀 Better Embeddings with Coupled Adam（结合Adam优化器的更好嵌入）

🧐 Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking（展示工作：事实核查员对可解释自动化事实核查的需求）

🧪 Towards Data-Efficient Pretraining for Atomic Property Prediction（面向原子性质预测的数据高效预训练）

🌀 The Mirage of Model Editing: Revisiting Evaluation in the Wild（模型编辑的幻象：重新审视实际应用中的评估）

🧮 Large Language Models and Mathematical Reasoning Failures（大型语言模型与数学推理失败）

📊 Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance（语言复杂度测量作为评估LLM性能的噪声零样本代理）

Shownotes Transcript

2025.02.18 | 稀疏注意力提升效率，机器人起身策略优化。