HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
唇语识别
Lipreading On Lrs2
Lipreading On Lrs2
评估指标
Word Error Rate (WER)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Word Error Rate (WER)
Paper Title
Repository
LIBS
65.29
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
TM-CTC + extLM
54.7
Deep Audio-Visual Speech Recognition
CTC + KD ASR
53.2
ASR is all you need: cross-modal distillation for lip reading
-
Conv-seq2seq
51.7
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
-
Hybrid CTC / Attention
50
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
-
LF-MMI TDNN
48.86
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
-
TM-seq2seq + extLM
48.3
Deep Audio-Visual Speech Recognition
Multi-head Visual-Audio Memory
44.5
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
MoCo + wav2vec (w/o extLM)
43.2
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Hybrid CTC / Attention
39.1
End-to-end Audio-visual Speech Recognition with Conformers
CTC/Attention
32.9
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Base*
31.4
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Base
30.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Base* + extLM
29.3
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
SyncVSR
28.9
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
VTP
28.9
Sub-word Level Lip Reading With Visual Attention
-
ES³ Base + extLM
28.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
ES³ Large
26.7
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
CTC/Attention (LRW+LRS2/3+AVSpeech)
25.5
Visual Speech Recognition for Multiple Languages in the Wild
ES³ Large + extLM
24.6
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
-
0 of 25 row(s) selected.
Previous
Next
Lipreading On Lrs2 | SOTA | HyperAI超神经