Core Concepts

What Is RLHF (Reinforcement Learning from Human Feedback)?

Reinforcement learning from human feedback (RLHF) is a training method that uses human preferences to fine-tune a model’s behavior. People rank model outputs, a reward model learns those preferences, and the model is optimized against it. RLHF was central to making ChatGPT and similar assistants follow instructions and behave more helpfully.

What Is RLHF (Reinforcement Learning from Human Feedback)?

Related topics

Further reading