
摘要
在语音理解(Spoken Language Understanding, SLU)任务中,其目标是从语音指令中提取关键信息,例如用户意图(即希望系统执行的操作)以及特定实体(如地点、数字等)。本文提出了一种简单的方法,将意图和实体嵌入有限状态转换器(Finite State Transducers)中,并结合预训练的通用语音识别模型(Speech-to-Text model),实现无需任何额外训练即可构建SLU系统。该方法构建模型速度极快,仅需数秒时间,且完全与语言无关。通过在多个基准数据集上的对比实验表明,该方法在性能上可超越多种其他更为资源密集型的SLU方法。
代码仓库
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| intent-classification-on-slurp | Finstreder (Quartznet) | Accuracy (%): 43.15 |
| intent-classification-on-slurp | Finstreder (Conformer) | Accuracy (%): 53.11 |
| slot-filling-on-slurp | Finstreder (Conformer) | F1: 0.395 |
| slot-filling-on-slurp | Finstreder (Quartznet) | F1: 0.313 |
| spoken-language-understanding-on-fluent | Finstreder (Quartznet + AMT) | Accuracy (%): 99.7 |
| spoken-language-understanding-on-fluent | Finstreder (Conformer + AMT, character-based) | Accuracy (%): 99.8 |
| spoken-language-understanding-on-fluent | Finstreder (Conformer) | Accuracy (%): 99.5 |
| spoken-language-understanding-on-fluent | Amazon Alexa | Accuracy (%): 98.7 |
| spoken-language-understanding-on-fluent | Finstreder (Quartznet) | Accuracy (%): 99.2 |
| spoken-language-understanding-on-snips | Finstreder (Conformer, character-based) | Accuracy (%): 89.0 |
| spoken-language-understanding-on-snips | Finstreder (Conformer) | Accuracy (%): 88.0 |
| spoken-language-understanding-on-snips | Finstreder (Quartznet) | Accuracy (%): 84.8 |
| spoken-language-understanding-on-snips-1 | Finstreder (Quartznet) | Accuracy-EN (%): 77.6 Accuracy-FR (%): 77.8 |
| spoken-language-understanding-on-snips-1 | Finstreder (Conformer, character-based) | Accuracy-EN (%): 87.9 Accuracy-FR (%): 86.5 |
| spoken-language-understanding-on-snips-1 | Finstreder (Conformer) | Accuracy-EN (%): 80.4 Accuracy-FR (%): 78.3 |
| spoken-language-understanding-on-timers-and | Finstreder (Quartznet) | Accuracy (%): 90.0 |
| spoken-language-understanding-on-timers-and | Finstreder (Conformer) | Accuracy (%): 95.4 |