HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

VeriGUI: Verifiable Long-Chain GUI Dataset

VeriGUI: Verifiable Long-Chain GUI Dataset

Abstract

Recent studies have delved into constructing autonomous agents capable ofperforming complex Graphical User Interface (GUI)-based computer tasks, withthe potential to revolutionize human-computer interaction. Despite encouragingresults, existing efforts mainly focus on short-term interactions and rely onoutcome-only verification, thereby limiting their scalability in real-world GUIapplications that demand long-horizon task decomposition and execution. In thiswork, we introduce VeriGUI, a novel verifiable long-chain GUI dataset designedto facilitate the development and evaluation of generalist GUI agents operatingin realistic computer environments. Our dataset emphasizes two criticaldimensions: (1) long-chain complexity, with tasks decomposed into a sequence ofinterdependent subtasks spanning hundreds of steps, explicitly designed toallow any subtask to serve as a valid starting point; and (2) subtask-levelverifiability, which enables diverse exploration strategies within eachsubtask, while ensuring that each subtask-level goal remains verifiable andconsistent. The dataset consists of GUI task trajectories across both desktopand web, annotated by human experts. Extensive experiments on VeriGUI usingvarious agents with different foundation models reveal significant performancegaps in handling long-horizon tasks, highlighting the need for more robustplanning and decision-making capabilities in GUI agents.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
VeriGUI: Verifiable Long-Chain GUI Dataset | Papers | HyperAI