HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

Kexin Wang Nils Reimers Iryna Gurevych

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

Abstract

Learning sentence embeddings often requires a large amount of labeled data. However, for most tasks and domains, labeled data is seldom available and creating it is expensive. In this work, we present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE) which outperforms previous approaches by up to 6.4 points. It can achieve up to 93.1% of the performance of in-domain supervised approaches. Further, we show that TSDAE is a strong domain adaptation and pre-training method for sentence embeddings, significantly outperforming other approaches like Masked Language Model. A crucial shortcoming of previous studies is the narrow evaluation: Most work mainly evaluates on the single task of Semantic Textual Similarity (STS), which does not require any domain knowledge. It is unclear if these proposed methods generalize to other domains and tasks. We fill this gap and evaluate TSDAE and other recent approaches on four different datasets from heterogeneous domains.

Code Repositories

UKPLab/useb
pytorch
Mentioned in GitHub
kwang2049/pytorch-bertflow
Official
pytorch
Mentioned in GitHub
climsocana/tecb-de
Mentioned in GitHub
kwang2049/useb
Official
pytorch
Mentioned in GitHub
ukplab/pytorch-bertflow
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
information-retrieval-on-cqadupstackTSDAE
mAP@100: 0.145
paraphrase-identification-on-pitTSDAE
AP: 69.2
paraphrase-identification-on-turlTSDAE
AP: 76.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning | Papers | HyperAI