HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

Qi Wu; Chunhua Shen; Anton van den Hengel; Peng Wang; Anthony Dick

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

Abstract

Much recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Specifically, we design a visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. It particularly allows questions to be asked about the contents of an image, even when the image itself does not contain a complete answer. Our final model achieves the best reported results on both image captioning and visual question answering on several benchmark datasets.

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-coco-visual-4CNN-RNN
Percentage correct: 59.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge | Papers | HyperAI