3 months ago

π^3: Scalable Permutation-Equivariant Visual Geometry Learning

Yifan Wang Jianjun Zhou Haoyi Zhu Wenzheng Chang Yang Zhou Zizun Li Junyi Chen Jiangmiao Pang Chunhua Shen Tong He

Abstract

We introduce pi^3, a feed-forward neural network that offers a novelapproach to visual geometry reconstruction, breaking the reliance on aconventional fixed reference view. Previous methods often anchor theirreconstructions to a designated viewpoint, an inductive bias that can lead toinstability and failures if the reference is suboptimal. In contrast, pi^3employs a fully permutation-equivariant architecture to predictaffine-invariant camera poses and scale-invariant local point maps without anyreference frames. This design makes our model inherently robust to inputordering and highly scalable. These advantages enable our simple and bias-freeapproach to achieve state-of-the-art performance on a wide range of tasks,including camera pose estimation, monocular/video depth estimation, and densepoint map reconstruction. Code and models are publicly available.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

π^3: Scalable Permutation-Equivariant Visual Geometry Learning

Yifan Wang Jianjun Zhou Haoyi Zhu Wenzheng Chang Yang Zhou Zizun Li Junyi Chen Jiangmiao Pang Chunhua Shen Tong He

Abstract

Build AI with AI

Hyper Newsletters