HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

Mondal Anindya ; Nag Sauradip ; Prada Joaquin M ; Zhu Xiatian ; Dutta Anjan

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

Abstract

Existing action recognition methods are typically actor-specific due to theintrinsic topological and apparent differences among the actors. This requiresactor-specific pose estimation (e.g., humans vs. animals), leading tocumbersome model design complexity and high maintenance costs. Moreover, theyoften focus on learning the visual modality alone and single-labelclassification whilst neglecting other available information sources (e.g.,class name text) and the concurrent occurrence of multiple actions. To overcomethese limitations, we propose a new approach called 'actor-agnostic multi-modalmulti-label action recognition,' which offers a unified solution for varioustypes of actors, including humans and animals. We further formulate a novelMulti-modal Semantic Query Network (MSQNet) model in a transformer-based objectdetection framework (e.g., DETR), characterized by leveraging visual andtextual modalities to represent the action classes better. The elimination ofactor-specific model designs is a key advantage, as it removes the need foractor pose estimation altogether. Extensive experiments on five publiclyavailable benchmarks show that our MSQNet consistently outperforms the priorarts of actor-specific alternatives on human and animal single- and multi-labelaction recognition tasks by up to 50%. Code is made available athttps://github.com/mondalanindya/MSQNet.

Code Repositories

mondalanindya/msqnet
Official
pytorch
Mentioned in GitHub

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Actor-agnostic Multi-label Action Recognition with Multi-modal Query | Papers | HyperAI