HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Exploring Train and Test-Time Augmentations for Audio-Language Learning

Eungbeom Kim Jinhee Kim Yoori Oh Kyungsu Kim Minju Park Jaeheon Sim Jinwoo Lee Kyogu Lee

Exploring Train and Test-Time Augmentations for Audio-Language Learning

Abstract

In this paper, we aim to unveil the impact of data augmentation in audio-language multi-modal learning, which has not been explored despite its importance. We explore various augmentation methods at not only train-time but also test-time and find out that proper data augmentation can lead to substantial improvements. Specifically, applying our proposed audio-language paired augmentation PairMix, which is the first multi-modal audio-language augmentation method, outperforms the baselines for both automated audio captioning and audio-text retrieval tasks. To fully take advantage of data augmentation, we also present multi-level test-time augmentation (Multi-TTA) for the test-time. We successfully incorporate the two proposed methods and uni-modal augmentations and achieve 47.5 SPIDEr on audio captioning, which is an 18.2% relative increase over the baseline. In audio-text retrieval, the proposed methods also show an improvement in performance as well.

Benchmarks

BenchmarkMethodologyMetrics
audio-captioning-on-audiocapsAL-MixGen
CIDEr: 0.755
SPICE: 0.177
SPIDEr: 0.466

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Exploring Train and Test-Time Augmentations for Audio-Language Learning | Papers | HyperAI