5 months ago

GPU-accelerated Guided Source Separation for Meeting Transcription

Raj Desh ; Povey Daniel ; Khudanpur Sanjeev

Abstract

Guided source separation (GSS) is a type of target-speaker extraction methodthat relies on pre-computed speaker activities and blind source separation toperform front-end enhancement of overlapped speech signals. It was firstproposed during the CHiME-5 challenge and provided significant improvementsover the delay-and-sum beamforming baseline. Despite its strengths, however,the method has seen limited adoption for meeting transcription benchmarksprimarily due to its high computation time. In this paper, we describe ourimproved implementation of GSS that leverages the power of modern GPU-basedpipelines, including batched processing of frequencies and segments, to provide300x speed-up over CPU-based inference. The improved inference time allows usto perform detailed ablation studies over several parameters of the GSSalgorithm -- such as context duration, number of channels, and noise class, toname a few. We provide end-to-end reproducible pipelines for speaker-attributedtranscription of popular meeting benchmarks: LibriCSS, AMI, and AliMeeting. Ourcode and recipes are publicly available: https://github.com/desh2608/gss.

Code Repositories

desh2608/diarizer

Official

desh2608/gss

Official

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
speech-recognition-on-libricss	GSS + Transducer	Word Error Rate (WER): 3.30

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette