3 months ago

On the Sentence Embeddings from Pre-trained Language Models

Bohan Li Hao Zhou Junxian He Mingxuan Wang Yiming Yang Lei Li

Abstract

Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow.

Code Repositories

sleepthroughdifficulties/kernelwhitening

pytorch

Mentioned in GitHub

bohanli/BERT-flow

Official

Mentioned in GitHub

InsaneLife/dssm

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
semantic-textual-similarity-on-sick	BERTbase-flow (NLI)	Spearman Correlation: 0.6544
semantic-textual-similarity-on-sts-benchmark	BERTlarge-flow (target)	Spearman Correlation: 0.7226
semantic-textual-similarity-on-sts12	BERTlarge-flow (target)	Spearman Correlation: 0.6520
semantic-textual-similarity-on-sts13	BERTlarge-flow (target)	Spearman Correlation: 0.7339
semantic-textual-similarity-on-sts14	BERTlarge-flow (target)	Spearman Correlation: 0.6942
semantic-textual-similarity-on-sts15	BERTlarge-flow (target)	Spearman Correlation: 0.7492
semantic-textual-similarity-on-sts16	BERTlarge-flow (target)	Spearman Correlation: 0.7763

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

On the Sentence Embeddings from Pre-trained Language Models

Bohan Li Hao Zhou Junxian He Mingxuan Wang Yiming Yang Lei Li

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters