HyperAI

Abstract

The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. However, very little is known to date about how to effectively extract meaningful information from a plethora of sensors that pound the computational engine of those devices. Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end. Our method differentiates in a data-driven manner useful signals from distracting ones using an attention mechanism. We evaluate the proposed approach on the recently introduced and challenging audio-visual scene-aware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by more than 20% on CIDEr.

Abstract

Tamir Hazan Alexander G. Schwing Idan Schwartz

Abstract

Build AI with AI

HyperAI Newsletters

Tamir Hazan Alexander G. Schwing Idan Schwartz

Abstract

Build AI with AI

HyperAI Newsletters

Tamir Hazan Alexander G. Schwing Idan Schwartz

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

A Simple Baseline for Audio-Visual Scene-Aware Dialog

Tamir Hazan Alexander G. Schwing Idan Schwartz

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

A Simple Baseline for Audio-Visual Scene-Aware Dialog

Tamir Hazan Alexander G. Schwing Idan Schwartz

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

A Simple Baseline for Audio-Visual Scene-Aware Dialog

Tamir Hazan Alexander G. Schwing Idan Schwartz

Abstract

Build AI with AI

HyperAI Newsletters