Command Palette
Search for a command to run...
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Abstract
Vision-Language Models (VLMs) have enabled computer use agents (CUAs) thatoperate GUIs autonomously, showing great potential, yet progress is limited bythe lack of large-scale, open-source computer use data and foundation models.In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. Itoffers a large-scale dataset spanning 6 operating systems and 3 task domains,built via a closed-loop pipeline uniting automated agents with human experts.Trained on this scaled-up data, ScaleCUA can operate seamlessly acrossplatforms. Specifically, it delivers strong gains over baselines (+26.6 onWebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-artresults (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% onWebArena-Lite-v2). These findings underscore the power of data-driven scalingfor general-purpose computer use agents. We will release data, models, and codeto advance future research: https://github.com/OpenGVLab/ScaleCUA.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.