HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

An alternative Approach in Voice Extraction

Pham The Hieu ; Nguyen Phuong Thanh Tran ; Nguyen Xuan Tho ; Nguyen Tan Dat ; Nguyen Duc Dung

An alternative Approach in Voice Extraction

Abstract

The research on audio clue-based target speaker extraction (TSE) has mostlyfocused on modeling the mixture and reference speech, achieving highperformance in English due to the availability of large datasets. However, lessattention has been given to the consistent properties of human speech acrosslanguages. To bridge this gap, we introduce an alternative model whichaddresses the challenge of transferring TSE models from one language to anotherwithout fine-tuning. In this work, we proposed a gating mechanism that is ableto modify specific frequencies based on the speaker's acoustic features. Themodel achieves an SI-SDR of 17.3544 on clean English speech and 13.2032 onclean speech mixed with Wham! noise, outperforming all other models in itsability to adapt to different languages.

Benchmarks

BenchmarkMethodologyMetrics
speech-separation-on-libri2mixWHYV
SDR: 17.2458
SI-SDRi: 17.5
speech-separation-on-whamWHYV
SI-SDRi: 12.964

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
An alternative Approach in Voice Extraction | Papers | HyperAI