HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Spoken Language Identification using ConvNets

Sarthak; Shikhar Shukla; Govind Mittal

Spoken Language Identification using ConvNets

Abstract

Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying languages, we can either adopt an implicit approach where only the speech for a language is present or an explicit one where text is available with its corresponding transcript. This paper focuses on an implicit approach due to the absence of transcriptive data. This paper benchmarks existing models and proposes a new attention based model for language identification which uses log-Mel spectrogram images as input. We also present the effectiveness of raw waveforms as features to neural network models for LI tasks. For training and evaluation of models, we classified six languages (English, French, German, Spanish, Russian and Italian) with an accuracy of 95.4% and four languages (English, French, German, Spanish) with an accuracy of 96.3% obtained from the VoxForge dataset. This approach can further be scaled to incorporate more languages.

Benchmarks

BenchmarkMethodologyMetrics
keyword-spotting-on-voxforge1D-ConvNet
Accuracy (%): 93.7
keyword-spotting-on-voxforge2D-ConvNet
Accuracy (%): 95.4
spoken-language-identification-on-voxforge2D ConvNet(MixUp=YES)
Accuracy (%): 95.4
spoken-language-identification-on-voxforge2D ConvNet(MixUp=NO)
Accuracy (%): 94.3
spoken-language-identification-on-voxforge1D ConvNet(MixUp=NO)
Accuracy (%): 93.7
spoken-language-identification-on-voxforge2D ConvNet with Attention and GRU(MixUp=YES)
Accuracy (%): 95.0
spoken-language-identification-on-voxforge-11D ConvNet(MixUp=NO)
Accuracy (%): 94.4
spoken-language-identification-on-voxforge-12D ConvNet with Attention and GRU(MixUp=YES)
Accuracy (%): 93.7
spoken-language-identification-on-voxforge-12D ConvNet with Attention and GRU(MixUp=NO)
Accuracy (%): 94.7
spoken-language-identification-on-voxforge-12D ConvNet(MixUp=NO)
Accuracy (%): 96.0
spoken-language-identification-on-voxforge-12D ConvNet(MixUp=YES)
Accuracy (%): 96.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Spoken Language Identification using ConvNets | Papers | HyperAI