HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Hierarchical Pronunciation Assessment with Multi-Aspect Attention

Heejin Do; Yunsu Kim; Gary Geunbae Lee

Hierarchical Pronunciation Assessment with Multi-Aspect Attention

Abstract

Automatic pronunciation assessment is a major component of a computer-assisted pronunciation training system. To provide in-depth feedback, scoring pronunciation at various levels of granularity such as phoneme, word, and utterance, with diverse aspects such as accuracy, fluency, and completeness, is essential. However, existing multi-aspect multi-granularity methods simultaneously predict all aspects at all granularity levels; therefore, they have difficulty in capturing the linguistic hierarchy of phoneme, word, and utterance. This limitation further leads to neglecting intimate cross-aspect relations at the same linguistic unit. In this paper, we propose a Hierarchical Pronunciation Assessment with Multi-aspect Attention (HiPAMA) model, which hierarchically represents the granularity levels to directly capture their linguistic structures and introduces multi-aspect attention that reflects associations across aspects at the same level to create more connotative representations. By obtaining relational information from both the granularity- and aspect-side, HiPAMA can take full advantage of multi-task learning. Remarkable improvements in the experimental results on the speachocean762 datasets demonstrate the robustness of HiPAMA, particularly in the difficult-to-assess aspects.

Code Repositories

doheejin/HiPAMA
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
phone-level-pronunciation-scoring-onHiPAMA-Librispeech
Pearson correlation coefficient (PCC): 0.62
utterance-level-pronounciation-scoring-onHiPAMA-Librispeech
Pearson correlation coefficient (PCC): 0.754
word-level-pronunciation-scoring-onHiPAMA-Librispeech
Pearson correlation coefficient (PCC): 0.59

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hierarchical Pronunciation Assessment with Multi-Aspect Attention | Papers | HyperAI