Command Palette
Search for a command to run...
One Model is Not Enough: Ensembles for Isolated Sign Language Recognition
{Zdeněk Krňoul Miroslav Hlaváč Matyáš Boháček Jakub Kanis Ivan Gruber Marek Hrúz}
Abstract
In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze two appearance-based approaches, I3D and TimeSformer, and one pose-based approach, SPOTER. The appearance-based approaches are trained on a few different data modalities, whereas the performance of SPOTER is evaluated on different types of preprocessing. All the methods are tested on two publicly available datasets: AUTSL and WLASL300. We experiment with ensemble techniques to achieve new state-of-the-art results of 73.84% accuracy on the WLASL300 dataset by using the CMA-ES optimization method to find the best ensemble weight parameters. Furthermore, we present an ensembling technique based on the Transformer model, which we call Neural Ensembler.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| sign-language-recognition-on-autsl | Ensemble - NTIS | Rank-1 Recognition Rate: 0.9637 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.