Core Concepts
What Is RLHF (Reinforcement Learning from Human Feedback)?
Reinforcement learning from human feedback (RLHF) is a training method that uses human preferences to fine-tune a model’s behavior. People rank model outputs, a reward model learns those preferences, and the model is optimized against it. RLHF was central to making ChatGPT and similar assistants follow instructions and behave more helpfully.
Further reading
Read more about RLHF — articles and blogs from around the web: