| OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow) | 98.6 | Omni-sourced Webly-supervised Learning for Video Recognition | |
| PERF-Net (multi-distilled S3D) | 98.6 | PERF-Net: Pose Empowered RGB-Flow Net | - |
| Two-Stream I3D (Imagenet+Kinetics pre-training) | 98.0 | Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset | |
| Two-Stream I3D (Kinetics pre-training) | 97.8 | Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset | |
| MARS+RGB+Flow (64 frames, Kinetics pretrained) | 97.8 | MARS: Motion-Augmented RGB Stream for Action Recognition | - |
| CCS + TSN (ImageNet+Kinetics pretrained) | 97.4 | Cooperative Cross-Stream Network for Discriminative Action Representation | - |
| R[2+1]D-TwoStream (Kinetics pretrained) | 97.3 | A Closer Look at Spatiotemporal Convolutions for Action Recognition | |