HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Voice Conversion With Just Nearest Neighbors

Matthew Baas Benjamin van Niekerk Herman Kamper

Voice Conversion With Just Nearest Neighbors

Abstract

Any-to-any voice conversion aims to transform source speech into a target voice with just a few examples of the target speaker as a reference. Recent methods produce convincing conversions, but at the cost of increased complexity -- making results difficult to reproduce and build on. Instead, we keep it simple. We propose k-nearest neighbors voice conversion (kNN-VC): a straightforward yet effective method for any-to-any conversion. First, we extract self-supervised representations of the source and reference speech. To convert to the target speaker, we replace each frame of the source representation with its nearest neighbor in the reference. Finally, a pretrained vocoder synthesizes audio from the converted representation. Objective and subjective evaluations show that kNN-VC improves speaker similarity with similar intelligibility scores to existing methods. Code, samples, trained models: https://bshall.github.io/knn-vc

Code Repositories

bshall/knn-vc
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
voice-conversion-on-librispeech-test-cleankNN-VC (prematched HiFiGAN)
Character Error Rate (CER): 2.96
Equal Error Rate: 37.15
Word Error Rate (WER): 7.36

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Voice Conversion With Just Nearest Neighbors | Papers | HyperAI