HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

Ru Peng Yawen Zeng Junbo Zhao

Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

Abstract

Past works on multimodal machine translation (MMT) elevate bilingual setup by incorporating additional aligned vision information. However, an image-must requirement of the multimodal dataset largely hinders MMT's development -- namely that it demands an aligned form of [image, source text, target text]. This limitation is generally troublesome during the inference phase especially when the aligned image is not provided as in the normal NMT setup. Thus, in this work, we introduce IKD-MMT, a novel MMT framework to support the image-free inference phase via an inversion knowledge distillation scheme. In particular, a multimodal feature generator is executed with a knowledge distillation module, which directly generates the multimodal feature from (only) source texts as the input. While there have been a few prior works entertaining the possibility to support image-free inference for machine translation, their performances have yet to rival the image-must translation. In our experiments, we identify our method as the first image-free approach to comprehensively rival or even surpass (almost) all image-must frameworks, and achieved the state-of-the-art result on the often-used Multi30k benchmark. Our code and data are available at: https://github.com/pengr/IKD-mmt/tree/master..

Code Repositories

pengr/ikd-mmt
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
multimodal-machine-translation-on-multi30kIKD-MMT
BLEU (EN-DE): 41.28
Meteor (EN-DE): 58.93
Meteor (EN-FR): 77.20

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation | Papers | HyperAI