Command Palette
Search for a command to run...
Ashesh Jain; Amir R. Zamir; Silvio Savarese; Ashutosh Saxena

Abstract
Deep Recurrent Neural Network architectures, though remarkably capable at modeling sequences, lack an intuitive high-level spatio-temporal structure. That is while many problems in computer vision inherently have an underlying high-level structure and can benefit from it. Spatio-temporal graphs are a popular tool for imposing such high-level intuitions in the formulation of real world problems. In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks~(RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower new approaches to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| human-pose-forecasting-on-human36m | SRNN | MAR, walking, 1,000ms: 2.13 MAR, walking, 400ms: 1.30 |
| skeleton-based-action-recognition-on-cad-120 | S-RNN (5-shot) | Accuracy: 85.4% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.