This chapter explores DeepSeek-RE, a model that uses reinforcement learning to enhance the reasoning capabilities of LLMs without supervised fine-tuning. It discusses the model's impressive results, its limitations, and the innovative use of knowledge distillation to transfer its reasoning abilities to smaller models.
DeepSeek-RE uses pure reinforcement learning to train an LLM, achieving strong reasoning abilities without supervised fine-tuning.
Knowledge distillation allows transferring the reasoning capabilities of DeepSeek-RE to smaller, more efficient models.
The model demonstrates that strong reasoning capabilities can be developed through reinforcement learning alone.