5 months ago

Zhaoyang Liu JingJing Xie Zichen Ding Zehao Li Bowen Yang Zhenyu Wu Xuehui Wang Qiushi Sun Shi Liu Weiyun Wang

Abstract

Vision-Language Models (VLMs) have enabled computer use agents (CUAs) thatoperate GUIs autonomously, showing great potential, yet progress is limited bythe lack of large-scale, open-source computer use data and foundation models.In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. Itoffers a large-scale dataset spanning 6 operating systems and 3 task domains,built via a closed-loop pipeline uniting automated agents with human experts.Trained on this scaled-up data, ScaleCUA can operate seamlessly acrossplatforms. Specifically, it delivers strong gains over baselines (+26.6 onWebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-artresults (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% onWebArena-Lite-v2). These findings underscore the power of data-driven scalingfor general-purpose computer use agents. We will release data, models, and codeto advance future research: https://github.com/OpenGVLab/ScaleCUA.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

5 months ago

Agent

Dataset

Human-Computer Interaction

Research Field

AI Infra

Method/Architecture

Zhaoyang Liu JingJing Xie Zichen Ding Zehao Li Bowen Yang Zhenyu Wu Xuehui Wang Qiushi Sun Shi Liu Weiyun Wang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

5 months ago

Agent

Dataset

Human-Computer Interaction

Research Field

AI Infra

Method/Architecture

Zhaoyang Liu JingJing Xie Zichen Ding Zehao Li Bowen Yang Zhenyu Wu Xuehui Wang Qiushi Sun Shi Liu Weiyun Wang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Zhaoyang Liu JingJing Xie Zichen Ding Zehao Li Bowen Yang Zhenyu Wu Xuehui Wang Qiushi Sun Shi Liu Weiyun Wang11 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Zhaoyang Liu JingJing Xie Zichen Ding Zehao Li Bowen Yang Zhenyu Wu Xuehui Wang Qiushi Sun Shi Liu Weiyun Wang11 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Zhaoyang Liu JingJing Xie Zichen Ding Zehao Li Bowen Yang Zhenyu Wu Xuehui Wang Qiushi Sun Shi Liu Weiyun Wang11 more

Abstract

Build AI with AI

HyperAI Newsletters

Zhaoyang Liu JingJing Xie Zichen Ding Zehao Li Bowen Yang Zhenyu Wu Xuehui Wang Qiushi Sun Shi Liu Weiyun Wang

Zhaoyang Liu JingJing Xie Zichen Ding Zehao Li Bowen Yang Zhenyu Wu Xuehui Wang Qiushi Sun Shi Liu Weiyun Wang

Zhaoyang Liu JingJing Xie Zichen Ding Zehao Li Bowen Yang Zhenyu Wu Xuehui Wang Qiushi Sun Shi Liu Weiyun Wang