HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

GPU-accelerated Guided Source Separation for Meeting Transcription

Raj Desh ; Povey Daniel ; Khudanpur Sanjeev

GPU-accelerated Guided Source Separation for Meeting Transcription

Abstract

Guided source separation (GSS) is a type of target-speaker extraction methodthat relies on pre-computed speaker activities and blind source separation toperform front-end enhancement of overlapped speech signals. It was firstproposed during the CHiME-5 challenge and provided significant improvementsover the delay-and-sum beamforming baseline. Despite its strengths, however,the method has seen limited adoption for meeting transcription benchmarksprimarily due to its high computation time. In this paper, we describe ourimproved implementation of GSS that leverages the power of modern GPU-basedpipelines, including batched processing of frequencies and segments, to provide300x speed-up over CPU-based inference. The improved inference time allows usto perform detailed ablation studies over several parameters of the GSSalgorithm -- such as context duration, number of channels, and noise class, toname a few. We provide end-to-end reproducible pipelines for speaker-attributedtranscription of popular meeting benchmarks: LibriCSS, AMI, and AliMeeting. Ourcode and recipes are publicly available: https://github.com/desh2608/gss.

Code Repositories

desh2608/gss
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
speech-recognition-on-libricssGSS + Transducer
Word Error Rate (WER): 3.30

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
GPU-accelerated Guided Source Separation for Meeting Transcription | Papers | HyperAI