We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 2024.08.12 每日AI论文 | VITA模型多模态交互领先,mPLUG-Owl3长图像序列理解卓越

2024.08.12 每日AI论文 | VITA模型多模态交互领先,mPLUG-Owl3长图像序列理解卓越

2024/8/12
logo of podcast HuggingFace 每日AI论文速递

HuggingFace 每日AI论文速递

Shownotes Transcript

大家好,欢迎收听'Hugging Face 每日AI论文速递'。今天是2024年8月12日,我们将带您快速浏览今日的10篇热门AI论文,涵盖全模态大型语言模型、多模态理解、视觉推理等多个前沿领域。现在,让我们立即进入精彩的论文世界。

[00:24] 🌐 VITA: Towards Open-Source Interactive Omni Multimodal LLM(VITA:迈向开源交互式全模态大型语言模型)

[00:58] 🦉 mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models(mPLUG-Owl3:多模态大型语言模型中长图像序列理解的研究)

[01:42] 🔍 Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2(Gemma Scope:在Gemma 2上全面开放稀疏自编码器)

[02:19] 🔍 UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling(UniBench:视觉推理需要重新思考视觉-语言模型超越规模)

[03:00] 📊 ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities(ToolSandbox:一个用于评估LLM工具使用能力的状态依赖、对话交互的评估基准)

[03:53] 🔄 MulliVC: Multi-lingual Voice Conversion With Cycle Consistency(MulliVC:多语言语音转换与循环一致性)

[04:36] 🔄 BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion(BRAT:架构无关文本反转的额外正交令牌)

[05:14] 🧠 Generating novel experimental hypotheses from language models: A case study on cross-dative generalization(从语言模型生成新的实验假设:跨间接泛化案例研究)

[05:52] 🎙 MooER: LLM-based Speech Recognition and Translation Models from Moore Threads(基于LLM的语音识别与翻译模型MooER)

[06:40] 📹 Kalman-Inspired Feature Propagation for Video Face Super-Resolution(基于Kalman滤波的特征传播在视频人脸超分辨率中的应用) 【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递