4 个月前

Jam-ALT:一种格式感知的歌词转录基准

Jam-ALT:一种格式感知的歌词转录基准

摘要

当前自动歌词转录(ALT)基准测试仅关注词汇内容,忽略了书面歌词中的细微差别,包括格式和标点符号,这可能导致与音乐家和作词者的创意作品以及听众体验之间的潜在错位。例如,换行在传达节奏、情感强调、押韵和高层次结构信息方面起着重要作用。为了解决这一问题,我们引入了基于 JamendoLyrics 数据集的新歌词转录基准——Jam-ALT。我们的贡献有两方面:首先,对转录进行了全面修订,专门针对 ALT 评估,遵循新创建的注释指南,统一了音乐行业的标准,涵盖了标点符号、换行、拼写、背景人声和非词语声音等方面;其次,设计了一套评价指标,与传统的词错误率不同,这套指标能够捕捉到上述现象。我们希望所提出的基准测试能够促进 ALT 任务的发展,使转录系统的评估更加精确可靠,并提升歌词应用(如实时字幕或卡拉OK字幕渲染)的用户体验。

代码仓库

基准测试

基准方法指标
automatic-lyrics-transcription-on-jam-altWhisper v2
Case Error Rate: 4.5
Line break F1: 69.3
Punctuation F1: 41.7
Section break F1: 3.3
Word Error Rate (WER): 35.7
automatic-lyrics-transcription-on-jam-altWhisper v2 +demucs
Case Error Rate: 5.3
Line break F1: 61.2
Punctuation F1: 28.0
Word Error Rate (WER): 44.0
automatic-lyrics-transcription-on-jam-altWhisper v3
Case Error Rate: 4.3
Line break F1: 73.5
Punctuation F1: 41.6
Section break F1: 1.0
Word Error Rate (WER): 35.5
automatic-lyrics-transcription-on-jam-altWhisper v3 +demucs
Case Error Rate: 3.8
Line break F1: 65.7
Punctuation F1: 29.0
Word Error Rate (WER): 47.9
automatic-lyrics-transcription-on-jam-altAudioShake v1
Case Error Rate: 3.4
Line break F1: 82.3
Parenthesis F-1: 29.4
Punctuation F1: 50.5
Section break F1: 72.1
Word Error Rate (WER): 26.0
automatic-lyrics-transcription-on-jam-alt-1Whisper v3 +demucs
Case Error Rate: 4.1
Line break F-1: 66.8
Punctuation F-1: 23.3
Word Error Rate (WER): 43.0
automatic-lyrics-transcription-on-jam-alt-1AudioShake v1
Case Error Rate: 3.4
Line break F-1: 80.7
Parenthesis F-1: 32.4
Punctuation F-1: 59.0
Section break F-1: 77.4
Word Error Rate (WER): 22.1
automatic-lyrics-transcription-on-jam-alt-1LyricWhiz
Case Error Rate: 3.5
Line break F-1: 74.0
Punctuation F-1: 34.0
Section break F-1: 1.4
Word Error Rate (WER): 24.6
automatic-lyrics-transcription-on-jam-alt-1Whisper v2
Case Error Rate: 3.5
Line break F-1: 63.0
Punctuation F-1: 31.3
Section break F-1: 11.2
Word Error Rate (WER): 43.8
automatic-lyrics-transcription-on-jam-alt-1Whisper v2 +demucs
Case Error Rate: 5.3
Line break F-1: 53.8
Punctuation F-1: 39.2
Word Error Rate (WER): 32.3
automatic-lyrics-transcription-on-jam-alt-1Whisper v3
Case Error Rate: 4.8
Line break F-1: 71.5
Punctuation F-1: 40.9
Section break F-1: 2.6
Word Error Rate (WER): 37.7
automatic-lyrics-transcription-on-jam-alt-2Whisper v3 +demucs
Case Error Rate: 3.6
Line break F-1: 52.4
Punctuation F-1: 28.7
Word Error Rate (WER): 61.5
automatic-lyrics-transcription-on-jam-alt-2AudioShake v1
Case Error Rate: 4.1
Line break F-1: 82.7
Parenthesis F-1: 38.0
Punctuation F-1: 47.8
Section break F-1: 69.6
Word Error Rate (WER): 22.5
automatic-lyrics-transcription-on-jam-alt-2Whisper v2 +demucs
Case Error Rate: 7.1
Line break F-1: 56.4
Punctuation F-1: 17.2
Word Error Rate (WER): 38.8
automatic-lyrics-transcription-on-jam-alt-2Whisper v2
Case Error Rate: 6.5
Line break F-1: 71.7
Punctuation F-1: 50.0
Section break F-1: 3.1
Word Error Rate (WER): 25.7
automatic-lyrics-transcription-on-jam-alt-2Whisper v3
Case Error Rate: 5.0
Line break F-1: 73.7
Punctuation F-1: 41.9
Word Error Rate (WER): 28.6
automatic-lyrics-transcription-on-jam-alt-3Whisper v2
Case Error Rate: 5.3
Line break F-1: 69.9
Punctuation F-1: 38.7
Word Error Rate (WER): 45.4
automatic-lyrics-transcription-on-jam-alt-3Whisper v2 +demucs
Case Error Rate: 5.9
Line break F-1: 67.5
Punctuation F-1: 30.2
Word Error Rate (WER): 65.2
automatic-lyrics-transcription-on-jam-alt-3AudioShake v1
Case Error Rate: 4.1
Line break F-1: 81.2
Parenthesis F-1: 8.1
Punctuation F-1: 48.5
Section break F-1: 69.2
Word Error Rate (WER): 24.4
automatic-lyrics-transcription-on-jam-alt-3Whisper v3 +demucs
Case Error Rate: 4.4
Line break F-1: 72.0
Punctuation F-1: 34.0
Word Error Rate (WER): 43.5
automatic-lyrics-transcription-on-jam-alt-3Whisper v3
Case Error Rate: 4.0
Line break F-1: 71.2
Punctuation F-1: 41.2
Section break F-1: 1.2
Word Error Rate (WER): 40.7
automatic-lyrics-transcription-on-jam-alt-4Whisper v2 +demucs
Case Error Rate: 3.2
Line break F-1: 66.1
Punctuation F-1: 34.9
Word Error Rate (WER): 43.3
automatic-lyrics-transcription-on-jam-alt-4AudioShake v1
Case Error Rate: 2.0
Line break F-1: 84.9
Parenthesis F-1: 41.3
Punctuation F-1: 45.8
Section break F-1: 72.5
Word Error Rate (WER): 34.9
automatic-lyrics-transcription-on-jam-alt-4Whisper v3 +demucs
Case Error Rate: 3.2
Line break F-1: 69.4
Punctuation F-1: 30.9
Word Error Rate (WER): 44.9
automatic-lyrics-transcription-on-jam-alt-4Whisper v2
Case Error Rate: 3.2
Line break F-1: 73.4
Punctuation F-1: 45.8
Section break F-1: 1.4
Word Error Rate (WER): 27.7
automatic-lyrics-transcription-on-jam-alt-4Whisper v3
Case Error Rate: 3.3
Line break F-1: 77.8
Punctuation F-1: 42.4
Word Error Rate (WER): 34.7

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
Jam-ALT:一种格式感知的歌词转录基准 | 论文 | HyperAI超神经