Papers Read on AI

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

2024/1/25

Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one

Self-Rewarding Language Models

2024/1/24

We posit that to achieve superhuman agents, future models require superhuman feedback in order to pr

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

2024/1/23

Recently the state space models (SSMs) with efficient hardware-aware designs, i.e., Mamba, have show

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

2024/1/22

Code generation problems differ from common natural language problems - they require matching the ex

Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

2024/1/19

As large language models (LLMs) are adopted as a fundamental component of language technologies, it

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

2024/1/18

This work elicits LLMs' inherent ability to handle long contexts without fine-tuning. The limited le

Large Language Models for Generative Information Extraction: A Survey

2024/1/17

Information extraction (IE) aims to extract structural knowledge (such as entities, relations, and e

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

2024/1/16

In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managi

Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon

2024/1/15

The utilization of long contexts poses a big challenge for large language models due to their limite

Parameter-Efficient Transfer Learning for NLP

2024/1/13

Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the pres

Mixtral of Experts

2024/1/12

We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same a

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

2024/1/11

State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challe

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

2024/1/11

This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high

Video Understanding with Large Language Models: A Survey

2024/1/10

With the burgeoning growth of online video platforms and the escalating volume of video content, the

GPT-4V(ision) is a Generalist Web Agent, if Grounded

2024/1/9

The recent development on large multimodal models (LMMs), especially GPT-4V(ision) and Gemini, has b

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

2024/1/8

In the era of advanced multimodel learning, multimodal large language models (MLLMs) such as GPT-4V

AnyText: Multilingual Visual Text Generation And Editing

2024/1/5

Diffusion model based Text-to-Image has achieved impressive achievements recently. Although current

KwaiAgents: Generalized Information-seeking Agent System with Large Language Models

2024/1/4

Driven by curiosity, humans have continually sought to explore and understand the world around them,

Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

2024/1/3

This paper introduces 26 guiding principles designed to streamline the process of querying and promp

Fast Inference of Mixture-of-Experts Language Models with Offloading

2024/1/2

With the widespread adoption of Large Language Models (LLMs), many deep learning practitioners are l

Episodes