HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Fangyu Liu; Ivan Vulić; Anna Korhonen; Nigel Collier

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

Abstract

Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous work has indicated that off-the-shelf MLMs are not effective as universal lexical or sentence encoders without further task-specific fine-tuning on NLI, sentence similarity, or paraphrasing tasks using annotated task data. In this work, we demonstrate that it is possible to turn MLMs into effective universal lexical and sentence encoders even without any additional data and without any supervision. We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT, which converts MLMs (e.g., BERT and RoBERTa) into such encoders in 20-30 seconds without any additional external knowledge. Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples, and aims to maximise their similarity during identity fine-tuning. We report huge gains over off-the-shelf MLMs with Mirror-BERT in both lexical-level and sentence-level tasks, across different domains and different languages. Notably, in the standard sentence semantic similarity (STS) tasks, our self-supervised Mirror-BERT model even matches the performance of the task-tuned Sentence-BERT models from prior work. Finally, we delve deeper into the inner workings of MLMs, and suggest some evidence on why this simple approach can yield effective universal lexical and sentence encoders.

Code Repositories

cambridgeltl/mirror-bert
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
semantic-textual-similarity-on-sickMirror-RoBERTa-base (unsup.)
Spearman Correlation: 0.706
semantic-textual-similarity-on-sickMirror-BERT-base (unsup.)
Spearman Correlation: 0.703
semantic-textual-similarity-on-sts-benchmarkMirror-BERT-base (unsup.)
Spearman Correlation: 0.764
semantic-textual-similarity-on-sts-benchmarkMirror-RoBERTa-base (unsup.)
Spearman Correlation: 0.787
semantic-textual-similarity-on-sts12Mirror-BERT-base (unsup.)
Spearman Correlation: 0.674
semantic-textual-similarity-on-sts12Mirror-RoBERTa-base (unsup.)
Spearman Correlation: 0.648
semantic-textual-similarity-on-sts13Mirror-RoBERTa-base (unsup.)
Spearman Correlation: 0.819
semantic-textual-similarity-on-sts13Mirror-BERT-base (unsup.)
Spearman Correlation: 0.796
semantic-textual-similarity-on-sts14Mirror-BERT-base (unsup.)
Spearman Correlation: 0.713
semantic-textual-similarity-on-sts14Mirror-RoBERTa-base (unsup.)
Spearman Correlation: 0.732
semantic-textual-similarity-on-sts15Mirror-RoBERTa-base (unsup.)
Spearman Correlation: 0.798
semantic-textual-similarity-on-sts15Mirror-BERT-base (unsup.)
Spearman Correlation: 0.814
semantic-textual-similarity-on-sts16Mirror-RoBERTa-base (unsup.)
Spearman Correlation: 0.78
semantic-textual-similarity-on-sts16Mirror-BERT-base (unsup.)
Spearman Correlation: 0.743

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders | Papers | HyperAI