HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement
  Learning

Abstract

Coordinating multiple embodied agents in dynamic environments remains a corechallenge in artificial intelligence, requiring both perception-drivenreasoning and scalable cooperation strategies. While recent works haveleveraged large language models (LLMs) for multi-agent planning, a few havebegun to explore vision-language models (VLMs) for visual reasoning. However,these VLM-based approaches remain limited in their support for diverseembodiment types. In this work, we introduce VIKI-Bench, the first hierarchicalbenchmark tailored for embodied multi-agent cooperation, featuring threestructured levels: agent activation, task planning, and trajectory perception.VIKI-Bench includes diverse robot embodiments, multi-view visual observations,and structured supervision signals to evaluate reasoning grounded in visualinputs. To demonstrate the utility of VIKI-Bench, we propose VIKI-R, atwo-stage framework that fine-tunes a pretrained vision-language model (VLM)using Chain-of-Thought annotated demonstrations, followed by reinforcementlearning under multi-level reward signals. Our extensive experiments show thatVIKI-R significantly outperforms baselines method across all task levels.Furthermore, we show that reinforcement learning enables the emergence ofcompositional cooperation patterns among heterogeneous agents. Together,VIKI-Bench and VIKI-R offer a unified testbed and method for advancingmulti-agent, visual-driven cooperation in embodied AI systems.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning | Papers | HyperAI