HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Ryoo Michael S. ; Piergiovanni AJ ; Tan Mingxing ; Angelova Anelia

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video
  Architectures

Abstract

Learning to represent videos is a very challenging task both algorithmicallyand computationally. Standard video CNN architectures have been designed bydirectly extending architectures devised for image understanding to include thetime dimension, using modules such as 3D convolutions, or by using two-streamdesign to capture both appearance and motion in videos. We interpret a videoCNN as a collection of multi-stream convolutional blocks connected to eachother, and propose the approach of automatically finding neural architectureswith better connectivity and spatio-temporal interactions for videounderstanding. This is done by evolving a population of overly-connectedarchitectures guided by connection weight learning. Architectures combiningrepresentations that abstract different input types (i.e., RGB and opticalflow) at multiple temporal resolutions are searched for, allowing differenttypes or sources of information to interact with each other. Our method,referred to as AssembleNet, outperforms prior approaches on public videodatasets, in some cases by a great margin. We obtain 58.6% mAP on Charades and34.27% accuracy on Moments-in-Time.

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-charadesAssembleNet-101
MAP: 58.6
action-classification-on-charadesAssembleNet
MAP: 58.6
action-classification-on-moments-in-timeAssembleNet
Top 1 Accuracy: 34.27%
Top 5 Accuracy: 62.71%
multimodal-activity-recognition-on-moments-inAssembleNet
Top-1 (%): 34.27
Top-5 (%): 62.71

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures | Papers | HyperAI