Command Palette
Search for a command to run...
Parmar Paritosh ; Morris Brendan

Abstract
Spatiotemporal representations learned using 3D convolutional neural networks(CNN) are currently used in state-of-the-art approaches for action relatedtasks. However, 3D-CNN are notorious for being memory and compute resourceintensive as compared with more simple 2D-CNN architectures. We propose tohallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNNstudent. By requiring the 2D-CNN to predict the future and intuit upcomingactivity, it is encouraged to gain a deeper understanding of actions and howthey evolve. The hallucination task is treated as an auxiliary task, which canbe used with any other action related task in a multitask learning setting.Thorough experimental evaluation shows that the hallucination task indeed helpsimprove performance on action recognition, action quality assessment, anddynamic scene recognition tasks. From a practical standpoint, being able tohallucinate spatiotemporal representations without an actual 3D-CNN can enabledeployment in resource-constrained scenarios, such as with limited computingpower and/or lower bandwidth. Codebase is available here:https://github.com/ParitoshParmar/HalluciNet.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-recognition-in-videos-on-ucf101 | HalluciNet (ResNet-50) | 3-fold Accuracy: 79.83 |
| scene-recognition-on-yup | HalluciNet (ResNet-50) | Accuracy (%): 84.44 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.