6 months ago

Jia Qi Yip Shengkui Zhao Yukun Ma Chongjia Ni Chong Zhang Hao Wang Trung Hieu Nguyen Kun Zhou Dianwen Ng Eng Siong Chng

Abstract

Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters. Model and weights are available at huggingface.co/yipjiaqi/spgm

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Transformer

Audio and Speech Processing

Convolutional Neural Network

Method/Architecture

Audio

Task/Problem

Jia Qi Yip Shengkui Zhao Yukun Ma Chongjia Ni Chong Zhang Hao Wang Trung Hieu Nguyen Kun Zhou Dianwen Ng Eng Siong Chng

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Transformer

Audio and Speech Processing

Convolutional Neural Network

Method/Architecture

Audio

Task/Problem

Jia Qi Yip Shengkui Zhao Yukun Ma Chongjia Ni Chong Zhang Hao Wang Trung Hieu Nguyen Kun Zhou Dianwen Ng Eng Siong Chng

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

SPGM: Prioritizing Local Features for enhanced speech separation performance

Jia Qi Yip Shengkui Zhao Yukun Ma Chongjia Ni Chong Zhang Hao Wang Trung Hieu Nguyen Kun Zhou Dianwen Ng Eng Siong Chng1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SPGM: Prioritizing Local Features for enhanced speech separation performance

Jia Qi Yip Shengkui Zhao Yukun Ma Chongjia Ni Chong Zhang Hao Wang Trung Hieu Nguyen Kun Zhou Dianwen Ng Eng Siong Chng1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SPGM: Prioritizing Local Features for enhanced speech separation performance

Jia Qi Yip Shengkui Zhao Yukun Ma Chongjia Ni Chong Zhang Hao Wang Trung Hieu Nguyen Kun Zhou Dianwen Ng Eng Siong Chng1 more

Abstract

Build AI with AI

HyperAI Newsletters

Jia Qi Yip Shengkui Zhao Yukun Ma Chongjia Ni Chong Zhang Hao Wang Trung Hieu Nguyen Kun Zhou Dianwen Ng Eng Siong Chng

Jia Qi Yip Shengkui Zhao Yukun Ma Chongjia Ni Chong Zhang Hao Wang Trung Hieu Nguyen Kun Zhou Dianwen Ng Eng Siong Chng

Jia Qi Yip Shengkui Zhao Yukun Ma Chongjia Ni Chong Zhang Hao Wang Trung Hieu Nguyen Kun Zhou Dianwen Ng Eng Siong Chng