8 months ago

Computer Vision

Image Understanding

Depth Estimation

Computer Vision

Jason Y. Zhang Sam Pepose Hanbyul Joo Deva Ramanan Jitendra Malik Angjoo Kanazawa

Abstract

We present a method that infers spatial arrangements and shapes of humans andobjects in a globally consistent 3D scene, all from a single image in-the-wildcaptured in an uncontrolled environment. Notably, our method runs on datasetswithout any scene- or object-level 3D supervision. Our key insight is thatconsidering humans and objects jointly gives rise to "3D common sense"constraints that can be used to resolve ambiguity. In particular, we introducea scale loss that learns the distribution of object size from data; anocclusion-aware silhouette re-projection loss to optimize object pose; and ahuman-object interaction loss to capture the spatial layout of objects withwhich humans interact. We empirically validate that our constraintsdramatically reduce the space of likely 3D spatial configurations. Wedemonstrate our approach on challenging, in-the-wild images of humansinteracting with large objects (such as bicycles, motorcycles, and surfboards)and handheld objects (such as laptops, tennis rackets, and skateboards). Wequantify the ability of our approach to recover human-object arrangements andoutline remaining challenges in this relatively domain. The project webpage canbe found at https://jasonyzhang.com/phosa.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Computer Vision

Image Understanding

Depth Estimation

Computer Vision

Jason Y. Zhang Sam Pepose Hanbyul Joo Deva Ramanan Jitendra Malik Angjoo Kanazawa

Abstract

We present a method that infers spatial arrangements and shapes of humans andobjects in a globally consistent 3D scene, all from a single image in-the-wildcaptured in an uncontrolled environment. Notably, our method runs on datasetswithout any scene- or object-level 3D supervision. Our key insight is thatconsidering humans and objects jointly gives rise to "3D common sense"constraints that can be used to resolve ambiguity. In particular, we introducea scale loss that learns the distribution of object size from data; anocclusion-aware silhouette re-projection loss to optimize object pose; and ahuman-object interaction loss to capture the spatial layout of objects withwhich humans interact. We empirically validate that our constraintsdramatically reduce the space of likely 3D spatial configurations. Wedemonstrate our approach on challenging, in-the-wild images of humansinteracting with large objects (such as bicycles, motorcycles, and surfboards)and handheld objects (such as laptops, tennis rackets, and skateboards). Wequantify the ability of our approach to recover human-object arrangements andoutline remaining challenges in this relatively domain. The project webpage canbe found at https://jasonyzhang.com/phosa.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild | Papers | HyperAI