HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

Abstract

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

Code Repositories

tatsu-lab/alpaca_farm
pytorch
Mentioned in GitHub
hiyouga/llama-efficient-tuning
pytorch
Mentioned in GitHub
laion-ai/open-assistant
Mentioned in GitHub
daniel-furman/sft-demos
pytorch
Mentioned in GitHub
ggml-org/llama.cpp
pytorch
Mentioned in GitHub
ggerganov/llama.cpp
pytorch
Mentioned in GitHub
grantslatton/llama.cpp
Mentioned in GitHub
longhao-chen/aicas2024
pytorch
Mentioned in GitHub
tatsu-lab/linguistic_calibration
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
question-answering-on-timequestionsInstructGPT
P@1: 22.4
question-answering-on-tiqInstructGpt
P@1: 23.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Training language models to follow instructions with human feedback | Papers | HyperAI