HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Samuel Humeau; Kurt Shuster; Marie-Anne Lachaux; Jason Weston

Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Abstract

The use of deep pre-trained bidirectional transformers has led to remarkable progress in a number of applications (Devlin et al., 2018). For tasks that make pairwise comparisons between sequences, matching a given input with a corresponding label, two approaches are common: Cross-encoders performing full self-attention over the pair and Bi-encoders encoding the pair separately. The former often performs better, but is too slow for practical use. In this work, we develop a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features. We perform a detailed comparison of all three approaches, including what pre-training and fine-tuning strategies work best. We show our models achieve state-of-the-art results on three existing tasks; that Poly-encoders are faster than Cross-encoders and more accurate than Bi-encoders; and that the best results are obtained by pre-training on large datasets similar to the downstream tasks.

Code Repositories

sfzhou5678/PolyEncoder
pytorch
Mentioned in GitHub
i2r-simmc/i2r-simmc-2020
pytorch
Mentioned in GitHub
llStringll/Poly-encoders
pytorch
Mentioned in GitHub
csong27/collision-bert
pytorch
Mentioned in GitHub
chijames/Poly-Encoder
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
conversational-response-selection-on-douban-1Poly-encoder
MAP: 0.608
MRR: 0.650
P@1: 0.475
R10@1: 0.299
R10@2: 0.494
R10@5: 0.822
conversational-response-selection-on-dstc7Bi-encoder
1-of-100 Accuracy: 66.3%
conversational-response-selection-on-dstc7Bi-encoder (v2)
1-of-100 Accuracy: 70.9%
conversational-response-selection-on-rrs-1Poly-encoder
NDCG@3: 0.679
NDCG@5: 0.765
conversational-response-selection-on-ubuntu-1Poly-encoder
R10@1: 0.882
R10@2: 0.949
R10@5: 0.990

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring | Papers | HyperAI