Keeping you up to date with the latest trends and best performing architectures in this fast evolvin
Large language models (LLMs) demonstrate powerful capabilities, but they still face challenges in pr
This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a pers
Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generali
Large language models have exhibited emergent abilities, demonstrating exceptional performance acros
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a cruci
We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs. Our system
The capacity of a neural network to absorb information is limited by its number of parameters. Condi
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM)
We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Model
Foundation models, now powering most of the exciting applications in deep learning, are almost unive
We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently sa
The dominant paradigm for instruction tuning is the random-shuffled training of maximally diverse in
Weight initialization plays an important role in neural network training. Widely used initialization
Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality
This paper does not present a novel method. Instead, it delves into an essential, yet must-know base
Large Language Models (LLMs) have shown impressive abilities in natural language understanding and g
Large language models (LLMs) have demonstrated remarkable performance and tremendous potential acros
Large language models (LLMs) have dramatically enhanced the field of language intelligence, as demon
Generating step-by-step"chain-of-thought"rationales improves language model performance on complex r