HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

Ondřej Cífka; Constantinos Dimitriou; Cheng-i Wang; Hendrik Schreiber; Luke Miner; Fabian-Robert Stöter

Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

Abstract

Current automatic lyrics transcription (ALT) benchmarks focus exclusively on word content and ignore the finer nuances of written lyrics including formatting and punctuation, which leads to a potential misalignment with the creative products of musicians and songwriters as well as listeners' experiences. For example, line breaks are important in conveying information about rhythm, emotional emphasis, rhyme, and high-level structure. To address this issue, we introduce Jam-ALT, a new lyrics transcription benchmark based on the JamendoLyrics dataset. Our contribution is twofold. Firstly, a complete revision of the transcripts, geared specifically towards ALT evaluation by following a newly created annotation guide that unifies the music industry's guidelines, covering aspects such as punctuation, line breaks, spelling, background vocals, and non-word sounds. Secondly, a suite of evaluation metrics designed, unlike the traditional word error rate, to capture such phenomena. We hope that the proposed benchmark contributes to the ALT task, enabling more precise and reliable assessments of transcription systems and enhancing the user experience in lyrics applications such as subtitle renderings for live captioning or karaoke.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
automatic-lyrics-transcription-on-jam-altWhisper v2
Case Error Rate: 4.5
Line break F1: 69.3
Punctuation F1: 41.7
Section break F1: 3.3
Word Error Rate (WER): 35.7
automatic-lyrics-transcription-on-jam-altWhisper v2 +demucs
Case Error Rate: 5.3
Line break F1: 61.2
Punctuation F1: 28.0
Word Error Rate (WER): 44.0
automatic-lyrics-transcription-on-jam-altWhisper v3
Case Error Rate: 4.3
Line break F1: 73.5
Punctuation F1: 41.6
Section break F1: 1.0
Word Error Rate (WER): 35.5
automatic-lyrics-transcription-on-jam-altWhisper v3 +demucs
Case Error Rate: 3.8
Line break F1: 65.7
Punctuation F1: 29.0
Word Error Rate (WER): 47.9
automatic-lyrics-transcription-on-jam-altAudioShake v1
Case Error Rate: 3.4
Line break F1: 82.3
Parenthesis F-1: 29.4
Punctuation F1: 50.5
Section break F1: 72.1
Word Error Rate (WER): 26.0
automatic-lyrics-transcription-on-jam-alt-1Whisper v3 +demucs
Case Error Rate: 4.1
Line break F-1: 66.8
Punctuation F-1: 23.3
Word Error Rate (WER): 43.0
automatic-lyrics-transcription-on-jam-alt-1AudioShake v1
Case Error Rate: 3.4
Line break F-1: 80.7
Parenthesis F-1: 32.4
Punctuation F-1: 59.0
Section break F-1: 77.4
Word Error Rate (WER): 22.1
automatic-lyrics-transcription-on-jam-alt-1LyricWhiz
Case Error Rate: 3.5
Line break F-1: 74.0
Punctuation F-1: 34.0
Section break F-1: 1.4
Word Error Rate (WER): 24.6
automatic-lyrics-transcription-on-jam-alt-1Whisper v2
Case Error Rate: 3.5
Line break F-1: 63.0
Punctuation F-1: 31.3
Section break F-1: 11.2
Word Error Rate (WER): 43.8
automatic-lyrics-transcription-on-jam-alt-1Whisper v2 +demucs
Case Error Rate: 5.3
Line break F-1: 53.8
Punctuation F-1: 39.2
Word Error Rate (WER): 32.3
automatic-lyrics-transcription-on-jam-alt-1Whisper v3
Case Error Rate: 4.8
Line break F-1: 71.5
Punctuation F-1: 40.9
Section break F-1: 2.6
Word Error Rate (WER): 37.7
automatic-lyrics-transcription-on-jam-alt-2Whisper v3 +demucs
Case Error Rate: 3.6
Line break F-1: 52.4
Punctuation F-1: 28.7
Word Error Rate (WER): 61.5
automatic-lyrics-transcription-on-jam-alt-2AudioShake v1
Case Error Rate: 4.1
Line break F-1: 82.7
Parenthesis F-1: 38.0
Punctuation F-1: 47.8
Section break F-1: 69.6
Word Error Rate (WER): 22.5
automatic-lyrics-transcription-on-jam-alt-2Whisper v2 +demucs
Case Error Rate: 7.1
Line break F-1: 56.4
Punctuation F-1: 17.2
Word Error Rate (WER): 38.8
automatic-lyrics-transcription-on-jam-alt-2Whisper v2
Case Error Rate: 6.5
Line break F-1: 71.7
Punctuation F-1: 50.0
Section break F-1: 3.1
Word Error Rate (WER): 25.7
automatic-lyrics-transcription-on-jam-alt-2Whisper v3
Case Error Rate: 5.0
Line break F-1: 73.7
Punctuation F-1: 41.9
Word Error Rate (WER): 28.6
automatic-lyrics-transcription-on-jam-alt-3Whisper v2
Case Error Rate: 5.3
Line break F-1: 69.9
Punctuation F-1: 38.7
Word Error Rate (WER): 45.4
automatic-lyrics-transcription-on-jam-alt-3Whisper v2 +demucs
Case Error Rate: 5.9
Line break F-1: 67.5
Punctuation F-1: 30.2
Word Error Rate (WER): 65.2
automatic-lyrics-transcription-on-jam-alt-3AudioShake v1
Case Error Rate: 4.1
Line break F-1: 81.2
Parenthesis F-1: 8.1
Punctuation F-1: 48.5
Section break F-1: 69.2
Word Error Rate (WER): 24.4
automatic-lyrics-transcription-on-jam-alt-3Whisper v3 +demucs
Case Error Rate: 4.4
Line break F-1: 72.0
Punctuation F-1: 34.0
Word Error Rate (WER): 43.5
automatic-lyrics-transcription-on-jam-alt-3Whisper v3
Case Error Rate: 4.0
Line break F-1: 71.2
Punctuation F-1: 41.2
Section break F-1: 1.2
Word Error Rate (WER): 40.7
automatic-lyrics-transcription-on-jam-alt-4Whisper v2 +demucs
Case Error Rate: 3.2
Line break F-1: 66.1
Punctuation F-1: 34.9
Word Error Rate (WER): 43.3
automatic-lyrics-transcription-on-jam-alt-4AudioShake v1
Case Error Rate: 2.0
Line break F-1: 84.9
Parenthesis F-1: 41.3
Punctuation F-1: 45.8
Section break F-1: 72.5
Word Error Rate (WER): 34.9
automatic-lyrics-transcription-on-jam-alt-4Whisper v3 +demucs
Case Error Rate: 3.2
Line break F-1: 69.4
Punctuation F-1: 30.9
Word Error Rate (WER): 44.9
automatic-lyrics-transcription-on-jam-alt-4Whisper v2
Case Error Rate: 3.2
Line break F-1: 73.4
Punctuation F-1: 45.8
Section break F-1: 1.4
Word Error Rate (WER): 27.7
automatic-lyrics-transcription-on-jam-alt-4Whisper v3
Case Error Rate: 3.3
Line break F-1: 77.8
Punctuation F-1: 42.4
Word Error Rate (WER): 34.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark | Papers | HyperAI