HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Dual-Path Convolutional Image-Text Embeddings with Instance Loss

Zheng Zhedong ; Zheng Liang ; Garrett Michael ; Yang Yi ; Xu Mingliang ; Shen Yi-Dong

Dual-Path Convolutional Image-Text Embeddings with Instance Loss

Abstract

Matching images and sentences demands a fine understanding of bothmodalities. In this paper, we propose a new system to discriminatively embedthe image and text to a shared visual-textual space. In this field, mostexisting works apply the ranking loss to pull the positive image / text pairsclose and push the negative pairs apart from each other. However, directlydeploying the ranking loss is hard for network learning, since it starts fromthe two heterogeneous features to build inter-modal relationship. To addressthis problem, we propose the instance loss which explicitly considers theintra-modal data distribution. It is based on an unsupervised assumption thateach image / text group can be viewed as a class. So the network can learn thefine granularity from every image/text group. The experiment shows that theinstance loss offers better weight initialization for the ranking loss, so thatmore discriminative embeddings can be learned. Besides, existing works usuallyapply the off-the-shelf features, i.e., word2vec and fixed visual feature. Soin a minor contribution, this paper constructs an end-to-end dual-pathconvolutional network to learn the image and text representations. End-to-endlearning allows the system to directly learn from the data and fully utilizethe supervision. On two generic retrieval datasets (Flickr30k and MSCOCO),experiments demonstrate that our method yields competitive accuracy compared tostate-of-the-art methods. Moreover, in language based person retrieval, weimprove the state of the art by a large margin. The code has been made publiclyavailable.

Code Repositories

pshroff04/Dual_Path_CNN
pytorch
Mentioned in GitHub
layumi/Image-Text-Embedding
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
cross-modal-retrieval-on-cuhk-pedesDual Path
Text-to-image Medr: 2
cross-modal-retrieval-on-flickr30kDual-Path (ResNet)
Image-to-text R@10: 89.5
Text-to-image R@1: 39.1
Text-to-image R@10: 80.9
Text-to-image R@5: 69.2
cross-modal-retrieval-on-flickr30kDual-Path (ResNet)
Image-to-text R@1: 55.6
Image-to-text R@5: 81.9
cross-modal-retrieval-on-mscoco-1kDual-path CNN
Image-to-text R@1: 41.2
Text-to-image R@1: 25.3
nlp-based-person-retrival-on-cuhk-pedesDual Path
R@1: 44.4
R@10: 75.07
R@5: 66.26

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Dual-Path Convolutional Image-Text Embeddings with Instance Loss | Papers | HyperAI