Command Palette
Search for a command to run...
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence
Yining Hong Rui Sun Bingxuan Li Xingcheng Yao Maxine Wu Alexander Chien Da Yin Ying Nian Wu Zhecan James Wang Kai-Wei Chang

Abstract
AI agents today are mostly siloed - they either retrieve and reason over vastamount of digital information and knowledge obtained online; or interact withthe physical world through embodied perception, planning and action - butrarely both. This separation limits their ability to solve tasks that requireintegrated physical and digital intelligence, such as cooking from onlinerecipes, navigating with dynamic map data, or interpreting real-world landmarksusing web knowledge. We introduce Embodied Web Agents, a novel paradigm for AIagents that fluidly bridge embodiment and web-scale reasoning. Tooperationalize this concept, we first develop the Embodied Web Agents taskenvironments, a unified simulation platform that tightly integrates realistic3D indoor and outdoor environments with functional web interfaces. Buildingupon this platform, we construct and release the Embodied Web Agents Benchmark,which encompasses a diverse suite of tasks including cooking, navigation,shopping, tourism, and geolocation - all requiring coordinated reasoning acrossphysical and digital realms for systematic assessment of cross-domainintelligence. Experimental results reveal significant performance gaps betweenstate-of-the-art AI systems and human capabilities, establishing bothchallenges and opportunities at the intersection of embodied cognition andweb-scale knowledge access. All datasets, codes and websites are publiclyavailable at our project page https://embodied-web-agent.github.io/.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.