Command Palette
Search for a command to run...
Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
Daniel Bermuth Alexander Poeppel Wolfgang Reif

Abstract
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands, like the intent of what a user wants the system to do and special entities like locations or numbers. This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training. Building those models is very fast and only takes a few seconds. It is also completely language independent. With a comparison on different benchmarks it is shown that this method can outperform multiple other, more resource demanding SLU approaches.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| intent-classification-on-slurp | Finstreder (Quartznet) | Accuracy (%): 43.15 |
| intent-classification-on-slurp | Finstreder (Conformer) | Accuracy (%): 53.11 |
| slot-filling-on-slurp | Finstreder (Conformer) | F1: 0.395 |
| slot-filling-on-slurp | Finstreder (Quartznet) | F1: 0.313 |
| spoken-language-understanding-on-fluent | Finstreder (Quartznet + AMT) | Accuracy (%): 99.7 |
| spoken-language-understanding-on-fluent | Finstreder (Conformer + AMT, character-based) | Accuracy (%): 99.8 |
| spoken-language-understanding-on-fluent | Finstreder (Conformer) | Accuracy (%): 99.5 |
| spoken-language-understanding-on-fluent | Amazon Alexa | Accuracy (%): 98.7 |
| spoken-language-understanding-on-fluent | Finstreder (Quartznet) | Accuracy (%): 99.2 |
| spoken-language-understanding-on-snips | Finstreder (Conformer, character-based) | Accuracy (%): 89.0 |
| spoken-language-understanding-on-snips | Finstreder (Conformer) | Accuracy (%): 88.0 |
| spoken-language-understanding-on-snips | Finstreder (Quartznet) | Accuracy (%): 84.8 |
| spoken-language-understanding-on-snips-1 | Finstreder (Quartznet) | Accuracy-EN (%): 77.6 Accuracy-FR (%): 77.8 |
| spoken-language-understanding-on-snips-1 | Finstreder (Conformer, character-based) | Accuracy-EN (%): 87.9 Accuracy-FR (%): 86.5 |
| spoken-language-understanding-on-snips-1 | Finstreder (Conformer) | Accuracy-EN (%): 80.4 Accuracy-FR (%): 78.3 |
| spoken-language-understanding-on-timers-and | Finstreder (Quartznet) | Accuracy (%): 90.0 |
| spoken-language-understanding-on-timers-and | Finstreder (Conformer) | Accuracy (%): 95.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.