HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

Soyeon Caren Han Siqu Long Siwen Luo Kunze Wang Josiah Poon

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

Abstract

Text-to-image multimodal tasks, generating/retrieving an image from a given text description, are extremely challenging tasks since raw text descriptions cover quite limited information in order to fully describe visually realistic images. We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input. First, we use the text description as initial input and conduct dependency parsing to extract the syntactic structure and analyse the semantic aspect, including object quantities, to extract the scene graph. Then, we train the extracted objects, attributes, and relations in the scene graph and the corresponding geometric relation information using Graph Convolutional Networks, and it generates text representation which integrates textual and visual semantic information. The text representation is aggregated with word-level and sentence-level embedding to generate both visual contextual word and sentence representation. For the evaluation, we attached VICTR to the state-of-the-art models in text-to-image generation.VICTR is easily added to existing models and improves across both quantitative and qualitative aspects.

Code Repositories

usydnlp/VICTR
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
text-to-image-generation-on-cocoStackGAN + VICTR
Inception score: 10.38
text-to-image-generation-on-cocoDM-GAN + VICTR
FID: 32.37
Inception score: 32.37
text-to-image-generation-on-cocoAttnGAN + VICTR
FID: 29.26
Inception score: 28.18

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks | Papers | HyperAI