HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors

Julien Hauret Malo Olivier Thomas Joubaud Christophe Langrenne Sarah Poirée Véronique Zimpfer Éric Bavu

Vibravox: A Dataset of French Speech Captured with Body-conduction Audio
  Sensors

Abstract

Vibravox is a dataset compliant with the General Data Protection Regulation(GDPR) containing audio recordings using five different body-conduction audiosensors : two in-ear microphones, two bone conduction vibration pickups and alaryngophone. The data set also includes audio data from an airborne microphoneused as a reference. The Vibravox corpus contains 38 hours of speech samplesand physiological sounds recorded by 188 participants under different acousticconditions imposed by an high order ambisonics 3D spatializer. Annotationsabout the recording conditions and linguistic transcriptions are also includedin the corpus. We conducted a series of experiments on various speech-relatedtasks, including speech recognition, speech enhancement and speakerverification. These experiments were carried out using state-of-the-art modelsto evaluate and compare their performances on signals captured by the differentaudio sensors offered by the Vibravox dataset, with the aim of gaining a bettergrasp of their individual characteristics.

Code Repositories

jhauret/vibravox
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
automatic-phoneme-recognition-on-vibravoxmedium wav2vec2.0
Test PER: 0.028
automatic-phoneme-recognition-on-vibravox-1medium wav2vec2.0
Test PER: 0.046
automatic-phoneme-recognition-on-vibravox-2medium wav2vec2.0
Test PER: 0.041
automatic-phoneme-recognition-on-vibravox-3medium wav2vec2.0
Test PER: 0.045
automatic-phoneme-recognition-on-vibravox-4medium wav2vec2.0
Test PER: 0.073
automatic-phoneme-recognition-on-vibravox-5medium wav2vec2.0
Test PER: 0.142
bandwidth-extension-on-vibravoxConfigurable EBEN (M=4, P=2, Q=4)
EER (ECAPA2): 0.0364
Noresqua-MOS: 4.285
PER (wav2vec2): 0.084
STOI: 0.877
bandwidth-extension-on-vibravox-foreheadConfigurable EBEN (M=4, P=4, Q=4)
EER (ECAPA2): 0.0183
Noresqua-MOS: 4.250
PER (wav2vec2): 0.091
STOI: 0.855
bandwidth-extension-on-vibravox-soft-in-earConfigurable EBEN (M=4, P=2, Q=4)
EER (ECAPA2): 0.0488
Noresqua-MOS: 4.331
PER (wav2vec2): 0.087
STOI: 0.868
bandwidth-extension-on-vibravox-templeConfigurable EBEN (M=4, P=1, Q=4)
EER (ECAPA2): 0.1622
Noresqua-MOS: 3.632
PER (wav2vec2): 0.391
STOI: 0.763
bandwidth-extension-on-vibravox-throatConfigurable EBEN (M=4, P=2, Q=4)
EER (ECAPA2): 0.0847
Noresqua-MOS: 3.862
PER (wav2vec2): 0.179
STOI: 0.834
speaker-verification-on-vibravox-foreheadECAPA2
Test EER: 0.009
Test min-DCF: 0.06
speaker-verification-on-vibravox-headsetECAPA2
Test EER: 0.0026
Test min-DCF: 0.02
speaker-verification-on-vibravox-rigid-in-earECAPA2
Test EER: 0.0316
Test min-DCF: 0.21
speaker-verification-on-vibravox-soft-in-earECAPA2
Test EER: 0.0172
Test min-DCF: 0.10
speaker-verification-on-vibravox-templeECAPA2
Test EER: 0.08
Test min-DCF: 0.58
speaker-verification-on-vibravox-throatECAPA2
Test EER: 0.0353
Test min-DCF: 0.20

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors | Papers | HyperAI