Papers Read on AI

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

2024/3/25

In this paper, we present a new embedding model, called M3-Embedding, which is distinguished for its

Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts

2024/3/22

By providing external information to large language models (LLMs), tool augmentation (including retr

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

2024/3/21

As Large Language Models (LLMs) continue to advance in their ability to write human-like text, a key

Chronos: Learning the Language of Time Series

2024/3/19

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series mode

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

2024/3/18

Advancing the frontier of subquadratic architectures for Language Models (LMs) is crucial in the rap

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

2024/3/15

We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian

Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents

2024/3/14

Recent advancements on Large Language Models (LLMs) enable AI Agents to automatically generate and e

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

2024/3/13

Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to t

TripoSR: Fast 3D Object Reconstruction from a Single Image

2024/3/12

This technical report introduces TripoSR, a 3D reconstruction model leveraging transformer architect

Diffusion Model-Based Image Editing: A Survey

2024/3/8

Denoising diffusion models have emerged as a powerful tool for various image generation and editing

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

2024/3/7

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLM

Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

2024/3/6

We introduce Bonito, an open-source model for conditional task generation: the task of converting un

Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases

2024/3/5

Prompt engineering is a challenging and important task due to the high sensitivity of Large Language

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

2024/3/4

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is train

BitDelta: Your Fine-Tune May Only Be Worth One Bit

2024/2/27

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-sca

Ring Attention with Blockwise Transformers for Near-Infinite Context

2024/2/26

Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcas

Premise Order Matters in Reasoning with Large Language Models

2024/2/23

Large language models (LLMs) have accomplished remarkable reasoning performance in various domains.

Generative Representational Instruction Tuning

2024/2/20

All text-based language problems can be reduced to either generation or embedding. Current models on

DoRA: Weight-Decomposed Low-Rank Adaptation

2024/2/19

Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gain

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

2024/2/18

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various h

Episodes