HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Tim Schopf; Daniel Braun; Florian Matthes

Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches

Abstract

Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents. Although existing studies have already investigated individual approaches to these categories, the experiments in literature do not provide a consistent comparison. This paper addresses this gap by conducting a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes. Different state-of-the-art approaches are benchmarked on four text classification datasets, including a new dataset from the medical domain. Additionally, novel SimCSE and SBERT-based baselines are proposed, as other baselines used in existing work yield weak classification results and are easily outperformed. Finally, the novel similarity-based Lbl2TransformerVec approach is presented, which outperforms previous state-of-the-art approaches in unsupervised text classification. Our experiments show that similarity-based approaches significantly outperform zero-shot approaches in most cases. Additionally, using SimCSE or SBERT embeddings instead of simpler text representations increases similarity-based classification results even further.

Code Repositories

sebischair/lbl2vec
Official
Mentioned in GitHub
sebischair/medical-abstracts-tc-corpus
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
unsupervised-text-classification-on-1Lbl2TransformerVec
F1-score: 64,69
unsupervised-text-classification-on-ag-newsLbl2TransformerVec
F1-score: 83,79
unsupervised-text-classification-on-medicalLbl2Vec
F1-score: 43.03
unsupervised-text-classification-on-medicalLbl2TransformerVec
F1-score: 56.46
unsupervised-text-classification-on-yahooLbl2TransformerVec
F1-score: 55.84

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches | Papers | HyperAI