Command Palette
Search for a command to run...
Multi Modal Classification On Vgg Sound
Metrics
Top-1 Accuracy
Results
Performance results of various models on this benchmark
| Paper Title | Repository | ||
|---|---|---|---|
| MMT | 66.2 | Multiscale Multimodal Transformer for Multimodal Action Recognition | - |
| CAV-MAE (Audio-Visual) | 65.9 | Contrastive Audio-Visual Masked Autoencoder | |
| UAVM | 65.8 | UAVM: Towards Unifying Audio and Visual Models | |
| AVT | 63.9 | AVT: Audio-Video Transformer for Multimodal Action Recognition | - |
0 of 4 row(s) selected.