Command Palette
Search for a command to run...
Mondal Anindya ; Nag Sauradip ; Prada Joaquin M ; Zhu Xiatian ; Dutta Anjan

Abstract
Existing action recognition methods are typically actor-specific due to theintrinsic topological and apparent differences among the actors. This requiresactor-specific pose estimation (e.g., humans vs. animals), leading tocumbersome model design complexity and high maintenance costs. Moreover, theyoften focus on learning the visual modality alone and single-labelclassification whilst neglecting other available information sources (e.g.,class name text) and the concurrent occurrence of multiple actions. To overcomethese limitations, we propose a new approach called 'actor-agnostic multi-modalmulti-label action recognition,' which offers a unified solution for varioustypes of actors, including humans and animals. We further formulate a novelMulti-modal Semantic Query Network (MSQNet) model in a transformer-based objectdetection framework (e.g., DETR), characterized by leveraging visual andtextual modalities to represent the action classes better. The elimination ofactor-specific model designs is a key advantage, as it removes the need foractor pose estimation altogether. Extensive experiments on five publiclyavailable benchmarks show that our MSQNet consistently outperforms the priorarts of actor-specific alternatives on human and animal single- and multi-labelaction recognition tasks by up to 50%. Code is made available athttps://github.com/mondalanindya/MSQNet.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-recognition-in-videos-on-charades | MSQNet | MAP: 47.57 |
| action-recognition-in-videos-on-hmdb51 | MSQNet | Accuracy: 93.25 |
| action-recognition-on-animal-kingdom | MSQNet | mAP: 73.1 |
| action-recognition-on-hockey | MSQNet | Accuracy: 3.05 |
| action-recognition-on-thumos14 | MSQNet | Accuracy: 83.16 |
| zero-shot-action-recognition-on-charades-1 | MSQNet | mAP: 35.59 |
| zero-shot-action-recognition-on-hmdb51 | MSQNet | Accuracy: 69.43 |
| zero-shot-action-recognition-on-thumos-14 | MSQNet | Accuracy: 75.33 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.