HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in
  LLMs

Abstract

Software engineering (SWE) has recently emerged as a crucial testbed fornext-generation LLM agents, demanding inherent capabilities in two criticaldimensions: sustained iterative problem-solving (e.g., >50 interaction rounds)and long-context dependency resolution (e.g., >32k tokens). However, the datacuration process in SWE remains notoriously time-consuming, as it heavilyrelies on manual annotation for code file filtering and the setup of dedicatedruntime environments to execute and validate unit tests. Consequently, mostexisting datasets are limited to only a few thousand GitHub-sourced instances.To this end, we propose an incremental, automated data-curation pipeline thatsystematically scales both the volume and diversity of SWE datasets. Ourdataset comprises 10,169 real-world Python task instances from 2,531 distinctGitHub repositories, each accompanied by a task specified in natural languageand a dedicated runtime-environment image for automated unit-test validation.We have carefully curated over 8,000 successfully runtime-validated trainingtrajectories from our proposed SWE dataset. When fine-tuning the Skywork-SWEmodel on these trajectories, we uncover a striking data scaling phenomenon: thetrained model's performance for software engineering capabilities in LLMscontinues to improve as the data size increases, showing no signs ofsaturation. Notably, our Skywork-SWE model achieves 38.0% pass@1 accuracy onthe SWE-bench Verified benchmark without using verifiers or multiple rollouts,establishing a new state-of-the-art (SOTA) among the Qwen2.5-Coder-32B-basedLLMs built on the OpenHands agent framework. Furthermore, with theincorporation of test-time scaling techniques, the performance further improvesto 47.0% accuracy, surpassing the previous SOTA results for sub-32B parametermodels. We release the Skywork-SWE-32B model checkpoint to accelerate futureresearch.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs | Papers | HyperAI