HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Let's Verify Step by Step

Hunter Lightman Vineet Kosaraju Yura Burda Harri Edwards Bowen Baker Teddy Lee Jan Leike John Schulman Ilya Sutskever Karl Cobbe

Let's Verify Step by Step

Abstract

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.

Code Repositories

consequentai/fneval
Mentioned in GitHub
openai/prm800k
Official
Mentioned in GitHub
gentopia-ai/gentopia
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
math-word-problem-solving-on-math-minivalProcess Supervision (GPT-4)
Accuracy: 78.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Let's Verify Step by Step | Papers | HyperAI