HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

FiLM: Visual Reasoning with a General Conditioning Layer

Ethan Perez; Florian Strub; Harm de Vries; Vincent Dumoulin; Aaron Courville

FiLM: Visual Reasoning with a General Conditioning Layer

Abstract

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

Code Repositories

GuessWhatGame/clevr
tf
Mentioned in GitHub
jjgo/hyperlight
pytorch
Mentioned in GitHub
ethanjperez/film
Official
pytorch
kdaip/stabletts
pytorch
Mentioned in GitHub
keonlee9420/Daft-Exprt
pytorch
Mentioned in GitHub
caffeinism/film-pytorch
pytorch
Mentioned in GitHub
CPJKU/audio_conditioned_unet
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-retrieval-with-multi-modal-query-on-mitFiLM
Recall@1: 10.1
Recall@10: 38.3
Recall@5: 27.7
visual-question-answering-on-clevrCNN+GRU+FiLM
Accuracy: 97.7
visual-question-answering-on-clevr-humansCNN+GRU+FiLM
Accuracy: 75.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
FiLM: Visual Reasoning with a General Conditioning Layer | Papers | HyperAI