5 months ago

Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors

Julien Hauret Malo Olivier Thomas Joubaud Christophe Langrenne Sarah Poirée Véronique Zimpfer Éric Bavu

Abstract

Vibravox is a dataset compliant with the General Data Protection Regulation(GDPR) containing audio recordings using five different body-conduction audiosensors : two in-ear microphones, two bone conduction vibration pickups and alaryngophone. The data set also includes audio data from an airborne microphoneused as a reference. The Vibravox corpus contains 38 hours of speech samplesand physiological sounds recorded by 188 participants under different acousticconditions imposed by an high order ambisonics 3D spatializer. Annotationsabout the recording conditions and linguistic transcriptions are also includedin the corpus. We conducted a series of experiments on various speech-relatedtasks, including speech recognition, speech enhancement and speakerverification. These experiments were carried out using state-of-the-art modelsto evaluate and compare their performances on signals captured by the differentaudio sensors offered by the Vibravox dataset, with the aim of gaining a bettergrasp of their individual characteristics.

Code Repositories

jhauret/vibravox

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
automatic-phoneme-recognition-on-vibravox	medium wav2vec2.0	Test PER: 0.028
automatic-phoneme-recognition-on-vibravox-1	medium wav2vec2.0	Test PER: 0.046
automatic-phoneme-recognition-on-vibravox-2	medium wav2vec2.0	Test PER: 0.041
automatic-phoneme-recognition-on-vibravox-3	medium wav2vec2.0	Test PER: 0.045
automatic-phoneme-recognition-on-vibravox-4	medium wav2vec2.0	Test PER: 0.073
automatic-phoneme-recognition-on-vibravox-5	medium wav2vec2.0	Test PER: 0.142
bandwidth-extension-on-vibravox	Configurable EBEN (M=4, P=2, Q=4)	EER (ECAPA2): 0.0364 Noresqua-MOS: 4.285 PER (wav2vec2): 0.084 STOI: 0.877
bandwidth-extension-on-vibravox-forehead	Configurable EBEN (M=4, P=4, Q=4)	EER (ECAPA2): 0.0183 Noresqua-MOS: 4.250 PER (wav2vec2): 0.091 STOI: 0.855
bandwidth-extension-on-vibravox-soft-in-ear	Configurable EBEN (M=4, P=2, Q=4)	EER (ECAPA2): 0.0488 Noresqua-MOS: 4.331 PER (wav2vec2): 0.087 STOI: 0.868
bandwidth-extension-on-vibravox-temple	Configurable EBEN (M=4, P=1, Q=4)	EER (ECAPA2): 0.1622 Noresqua-MOS: 3.632 PER (wav2vec2): 0.391 STOI: 0.763
bandwidth-extension-on-vibravox-throat	Configurable EBEN (M=4, P=2, Q=4)	EER (ECAPA2): 0.0847 Noresqua-MOS: 3.862 PER (wav2vec2): 0.179 STOI: 0.834
speaker-verification-on-vibravox-forehead	ECAPA2	Test EER: 0.009 Test min-DCF: 0.06
speaker-verification-on-vibravox-headset	ECAPA2	Test EER: 0.0026 Test min-DCF: 0.02
speaker-verification-on-vibravox-rigid-in-ear	ECAPA2	Test EER: 0.0316 Test min-DCF: 0.21
speaker-verification-on-vibravox-soft-in-ear	ECAPA2	Test EER: 0.0172 Test min-DCF: 0.10
speaker-verification-on-vibravox-temple	ECAPA2	Test EER: 0.08 Test min-DCF: 0.58
speaker-verification-on-vibravox-throat	ECAPA2	Test EER: 0.0353 Test min-DCF: 0.20

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette