HyperAIHyperAI

Command Palette

Search for a command to run...

World Action Model WAM

Date

6 hours ago

Organization

NVIDIA

Paper URL

arxiv.org

The World Action Model (WAM) is a novel AI foundational model architecture for the fields of embodied intelligence and robotics. It was first proposed by NVIDIA in February 2026, with related research published in a paper titled "...".World Action Models are Zero-shot PoliciesThe paper proposes DreamZero (a 14-parameter robot foundation model) and, for the first time, explicitly uses the term World Action Model (WAM) to define this novel architecture. The paper points out that, unlike traditional VLA (which only maps single-step actions), WAM is a foundation model that directly inherits prior knowledge of the physical world by jointly predicting the "future world state (video stream)" and the "robot's actions," thus achieving extremely strong zero-shot generalization capability (Zero-shot Policy). In addition, NVIDIA officially released an entry titled "..."What Is a World Action Model?Further explanation is needed.

In May 2026, Fudan University, Shanghai Innovation Academy, and the National University of Singapore published a paper titled "World Action Models: The Next Frontier in Embodied AIThe paper provides a systematic review, explicitly defining WAM as: "An embodied foundational model that unifies predictive state modeling with action generation, with the goal of training a joint distribution of future states and actions, not just the actions themselves."

With NVIDIA DreamZero For example, WAM's underlying architecture is actually a massive video generation model (based on a video diffusion backbone network, such as Wan2.1 or NVIDIA Cosmos). The core workflow can be divided into three steps:

Input: Current screen + voice command + robot's current status
⬇️
[WAM core model (such as the 14B parameter DiT architecture)]
⬇️
One Forward Pass:

  1. Predicted future video frames (what the world will look like next)
  2. The robot's precise movements in each frame (6-DOF joint trajectories)

Through this joint prediction, actions and the evolution of the physical world are inextricably linked. For a robot to generate actions correctly, it must correctly generate future videos in its mind that conform to the laws of physics (gravity, friction, occlusion relationships).

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp