HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

A Simple Baseline for Audio-Visual Scene-Aware Dialog

{ Tamir Hazan Alexander G. Schwing Idan Schwartz}

A Simple Baseline for Audio-Visual Scene-Aware Dialog

Abstract

The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. However, very little is known to date about how to effectively extract meaningful information from a plethora of sensors that pound the computational engine of those devices. Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end. Our method differentiates in a data-driven manner useful signals from distracting ones using an attention mechanism. We evaluate the proposed approach on the recently introduced and challenging audio-visual scene-aware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by more than 20% on CIDEr.

Benchmarks

BenchmarkMethodologyMetrics
scene-aware-dialogue-on-avsdsimple
CIDEr: 0.941

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Simple Baseline for Audio-Visual Scene-Aware Dialog | Papers | HyperAI