Papers Read on AI

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

2024/9/19

Recent advances in language models have achieved significant progress. GPT-4o, as a new milestone, h

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

2024/9/18

Models like GPT-4o enable real-time interaction with large language models (LLMs) through speech, si

GeoCalib: Learning Single-image Calibration with Geometric Optimization

2024/9/17

From a single image, visual cues can help deduce intrinsic and extrinsic camera parameters like the

Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

2024/9/13

Insect production for food and feed presents a promising supplement to ensure food safety and addres

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

2024/9/12

Recent advancements in audio generation have been significantly propelled by the capabilities of Lar

rerankers: A Lightweight Python Library to Unify Ranking Methods

2024/9/11

This paper presents rerankers, a Python library which provides an easy-to-use interface to the most

Automated Design of Agentic Systems

2024/9/10

Researchers are investing substantial effort in developing powerful general-purpose agents, wherein

Text2SQL is Not Enough: Unifying AI and Databases with TAG

2024/9/9

AI systems that serve natural language questions over databases promise to unlock tremendous value.

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

2024/9/5

The ability to accurately interpret complex visual information is a crucial topic of multimodal larg

Sapiens: Foundation for Human Vision Models

2024/9/4

We present Sapiens, a family of models for four fundamental human-centric vision tasks -- 2D pose es

OctFusion: Octree-based Diffusion Models for 3D Shape Generation

2024/9/3

Diffusion models have emerged as a popular method for 3D generation. However, it is still challengin

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

2024/9/2

In this paper, we introduce Writing in the Margins (WiM), a new inference pattern for Large Language

Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs

2024/8/30

Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering na

RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

2024/8/29

Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowle

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

2024/8/28

Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external kno

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

2024/8/23

7 chapters Transcript

We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lea

Episodes