HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Keyword Transformer: A Self-Attention Model for Keyword Spotting

Axel Berg Mark O&#39 Connor Miguel Tairum Cruz

Keyword Transformer: A Self-Attention Model for Keyword Spotting

Abstract

The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.

Benchmarks

BenchmarkMethodologyMetrics
keyword-spotting-on-google-speech-commandsKWT-1
Google Speech Commands V1 12: 97.26±0.18
Google Speech Commands V2 12: 98.08±0.10
Google Speech Commands V2 35: 96.95±0.14
keyword-spotting-on-google-speech-commandsKWT-2
Google Speech Commands V1 12: 97.27 ±0.08
Google Speech Commands V2 12: 98.43±0.08
Google Speech Commands V2 35: 97.74 ±0.03
keyword-spotting-on-google-speech-commandsKWT-3
Google Speech Commands V1 12: 97.49 ±0.15
Google Speech Commands V2 12: 98.56 ±0.07
Google Speech Commands V2 35: 97.69 ±0.09

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Keyword Transformer: A Self-Attention Model for Keyword Spotting | Papers | HyperAI