HyperAIHyperAI

Command Palette

Search for a command to run...

2 months ago

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Abstract

This paper introduces GUI-Owl, a foundational GUI agent model that achievesstate-of-the-art performance among open-source end-to-end models on ten GUIbenchmarks across desktop and mobile environments, covering grounding, questionanswering, planning, decision-making, and procedural knowledge. GUI-Owl-7Bachieves 66.4 on AndroidWorld and 29.4 on OSWorld. Building on this, we proposeMobile-Agent-v3, a general-purpose GUI agent framework that further improvesperformance to 73.3 on AndroidWorld and 37.7 on OSWorld, setting a newstate-of-the-art for open-source GUI agent frameworks. GUI-Owl incorporatesthree key innovations: (1) Large-scale Environment Infrastructure: acloud-based virtual environment spanning Android, Ubuntu, macOS, and Windows,enabling our Self-Evolving GUI Trajectory Production framework. This generateshigh-quality interaction data via automated query generation and correctnessvalidation, leveraging GUI-Owl to refine trajectories iteratively, forming aself-improving loop. It supports diverse data pipelines and reduces manualannotation. (2) Diverse Foundational Agent Capabilities: by integrating UIgrounding, planning, action semantics, and reasoning patterns, GUI-Owl supportsend-to-end decision-making and can act as a modular component in multi-agentsystems. (3) Scalable Environment RL: we develop a scalable reinforcementlearning framework with fully asynchronous training for real-world alignment.We also introduce Trajectory-aware Relative Policy Optimization (TRPO) foronline RL, achieving 34.9 on OSWorld. GUI-Owl and Mobile-Agent-v3 areopen-sourced at https://github.com/X-PLUG/MobileAgent.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Mobile-Agent-v3: Foundamental Agents for GUI Automation | Papers | HyperAI