HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement
  Learning for LLM Reasoning

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effectivefor training large language models (LLMs) on complex reasoning tasks, such asmathematical problem solving. A prerequisite for the scalability of RLVR is ahigh-quality problem set with precise and verifiable answers. However, thescarcity of well-crafted human-labeled math problems and limited-verificationanswers in existing distillation-oriented synthetic datasets limit theireffectiveness in RL. Additionally, most problem synthesis strategiesindiscriminately expand the problem set without considering the model'scapabilities, leading to low efficiency in generating useful questions. Tomitigate this issue, we introduce a Self-aware Weakness-driven problemSynthesis framework (SwS) that systematically identifies model deficiencies andleverages them for problem augmentation. Specifically, we define weaknesses asquestions that the model consistently fails to learn through its iterativesampling during RL training. We then extract the core concepts from thesefailure cases and synthesize new problems to strengthen the model's weak areasin subsequent augmented training, enabling it to focus on and graduallyovercome its weaknesses. Without relying on external knowledge distillation,our framework enables robust generalization byempowering the model toself-identify and address its weaknesses in RL, yielding average performancegains of 10.0% and 7.7% on 7B and 32B models across eight mainstream reasoningbenchmarks.

Code Repositories

mastervito/sws
Official
Mentioned in GitHub

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning | Papers | HyperAI