HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

Stephan Gouws; Yoshua Bengio; Greg Corrado

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

Abstract

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data. Instead it trains directly on monolingual data and extracts a bilingual signal from a smaller set of raw-text sentence-aligned data. This is achieved using a novel sampled bag-of-words cross-lingual objective, which is used to regularize two noise-contrastive language models for efficient cross-lingual feature learning. We show that bilingual embeddings learned using the proposed model outperform state-of-the-art methods on a cross-lingual document classification task as well as a lexical translation task on WMT11 data.

Code Repositories

eske/multivec
Mentioned in GitHub
gouwsmeister/bilbowa
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
document-classification-on-reuters-de-enBilBOWA
Accuracy: 75
document-classification-on-reuters-en-deBilBOWA
Accuracy: 86.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BilBOWA: Fast Bilingual Distributed Representations without Word Alignments | Papers | HyperAI