Lipreading On Lrs2

评估指标

Word Error Rate (WER)

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
LIBS65.29Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
TM-CTC + extLM54.7Deep Audio-Visual Speech Recognition
CTC + KD ASR53.2ASR is all you need: cross-modal distillation for lip reading-
Conv-seq2seq51.7Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading-
Hybrid CTC / Attention50Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture-
LF-MMI TDNN48.86Audio-visual Recognition of Overlapped speech for the LRS2 dataset-
TM-seq2seq + extLM48.3Deep Audio-Visual Speech Recognition
Multi-head Visual-Audio Memory44.5Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
MoCo + wav2vec (w/o extLM)43.2Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Hybrid CTC / Attention39.1End-to-end Audio-visual Speech Recognition with Conformers
CTC/Attention32.9Visual Speech Recognition for Multiple Languages in the Wild
ES³ Base*31.4ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations-
ES³ Base30.7ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations-
ES³ Base* + extLM29.3ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations-
SyncVSR28.9SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
VTP28.9Sub-word Level Lip Reading With Visual Attention-
ES³ Base + extLM28.7ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations-
ES³ Large26.7ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations-
CTC/Attention (LRW+LRS2/3+AVSpeech)25.5Visual Speech Recognition for Multiple Languages in the Wild
ES³ Large + extLM24.6ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations-
0 of 25 row(s) selected.
Lipreading On Lrs2 | SOTA | HyperAI超神经