Papers Read on AI

Retrieval-Augmented Generation for Large Language Models: A Survey

2023/12/29

Large language models (LLMs) demonstrate powerful capabilities, but they still face challenges in pr

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

2023/12/28

This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a pers

Pearl: A Production-ready Reinforcement Learning Agent

2023/12/19

Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generali

Are Emergent Abilities in Large Language Models just In-Context Learning?

2023/12/17

Large language models have exhibited emergent abilities, demonstrating exceptional performance acros

Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models

2023/12/16

Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl

Instruction Tuning for Large Language Models: A Survey

2023/12/15

This paper surveys research works in the quickly advancing field of instruction tuning (IT), a cruci

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

2023/12/14

5 chapters Transcript

We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs. Our system

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

2023/12/13

The capacity of a neural network to absorb information is limited by its number of parameters. Condi

Sequential Modeling Enables Scalable Learning for Large Vision Models

2023/12/12

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM)

Magicoder: Source Code Is All You Need

2023/12/11

We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Model

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

2023/12/10

Foundation models, now powering most of the exciting applications in deep learning, are almost unive

Adversarial Diffusion Distillation

2023/12/9

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently sa

Instruction Tuning with Human Curriculum

2023/12/8

The dominant paradigm for instruction tuning is the random-shuffled training of maximally diverse in

Initializing Models with Larger Ones

2023/12/7

Weight initialization plays an important role in neural network training. Widely used initialization

Improving Sample Quality of Diffusion Models Using Self-Attention Guidance

2023/12/6

Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

2023/12/5

This paper does not present a novel method. Instead, it delves into an essential, yet must-know base

TaskWeaver: A Code-First Agent Framework

2023/12/4

Large Language Models (LLMs) have shown impressive abilities in natural language understanding and g

Efficient LLM Inference on CPUs

2023/12/2

Large language models (LLMs) have demonstrated remarkable performance and tremendous potential acros

Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents

2023/12/1

Large language models (LLMs) have dramatically enhanced the field of language intelligence, as demon

STaR: Bootstrapping Reasoning With Reasoning

2023/11/30

Generating step-by-step"chain-of-thought"rationales improves language model performance on complex r

Episodes