HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Multi-Turn Code Generation Through Single-Step Rewards

Arnav Kumar Jain Gonzalo Gonzalez-Pumariega Wayne Chen Alexander M Rush Wenting Zhao Sanjiban Choudhury

Multi-Turn Code Generation Through Single-Step Rewards

Abstract

We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple yet scalable approach, CODE, that solves multi-turn code generation using only single-step rewards. Our key insight is that code generation is a one-step recoverable MDP, where the correct code can be recovered from any intermediate code state in a single turn. CODE iteratively trains both a generator to provide code solutions conditioned on multi-turn execution feedback and a verifier to score the newly generated code. Experimental evaluations show that our approach achieves significant improvements over state-of-the-art baselines. We provide analysis of the design choices of the reward models and policy, and show the efficacy of CODE at utilizing the execution feedback.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multi-Turn Code Generation Through Single-Step Rewards | Papers | HyperAI