
摘要
基于深度学习的文本到语音(TTS)系统随着模型架构、训练方法和跨说话人及语言泛化的进步而迅速发展。然而,这些进展尚未在印度语言的语音合成中得到充分研究。鉴于印度语言的数量和多样性、相对较低的资源可用性以及神经TTS领域的多种未测试的进步,此类研究在计算上非常昂贵。本文中,我们评估了用于德拉维达语和印欧语系语言的声学模型、声码器、辅助损失函数、训练计划以及说话人和语言多样性方面的选择。基于此评估,我们确定了使用FastPitch和HiFi-GAN V1联合训练男性和女性说话人的单语模型表现最佳。利用这一设置,我们为13种语言训练并评估了TTS模型,并发现我们的模型在所有语言中的平均意见得分(Mean Opinion Scores, MOS)显著优于现有模型。我们将所有模型开源至Bhashini平台。
代码仓库
ai4bharat/indic-tts
pytorch
GitHub 中提及
gokulkarthik/text2speech
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| speech-synthesis-assamese-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 2.39 |
| speech-synthesis-bengali-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.37 |
| speech-synthesis-bodo-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.53 |
| speech-synthesis-gujarati-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.58 |
| speech-synthesis-hindi-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 4.00 |
| speech-synthesis-kannada-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.68 |
| speech-synthesis-malayalam-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.64 |
| speech-synthesis-manipuri-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.30 |
| speech-synthesis-marathi-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.26 |
| speech-synthesis-rajasthani-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.40 |
| speech-synthesis-tamil-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.84 |
| speech-synthesis-telugu-on-indictts | AI4BharatTTS - FastPitch with HiFiGAN | Mean Opinion Score: 3.66 |