HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

{Mohan Kankanhalli Yi Yang Hehe Fan}

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Abstract

Point cloud videos exhibit irregularities and lack of order along the spatial dimension where points emerge inconsistently across different frames. To capture the dynamics in point cloud videos, point tracking is usually employed. However, as points may flow in and out across frames, computing accurate point trajectories is extremely difficult. Moreover, tracking usually relies on point colors and thus may fail to handle colorless point clouds. In this paper, to avoid point tracking, we propose a novel Point 4D Transformer (P4Transformer) network to model raw point cloud videos. Specifically, P4Transformer consists of (i) a point 4D convolution to embed the spatio-temporal local structures presented in a point cloud video and (ii) a transformer to capture the appearance and motion information across the entire video by performing self-attention on the embedded local features. In this fashion, related or similar local areas are merged with attention weight rather than by explicit tracking. Extensive experiments, including 3D action recognition and 4D semantic segmentation, on four benchmarks demonstrate the effectiveness of our P4Transformer for point cloud video modeling.

Benchmarks

BenchmarkMethodologyMetrics
3d-action-recognition-on-ntu-rgb-d-1P4Transformer
Cross Subject Accuracy: 90.2
Cross View Accuracy: 96.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos | Papers | HyperAI