HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

A Self-Training Approach for Short Text Clustering

{Chris Develder Thomas Demeester Lucas Sterckx Amir Hadifar}

A Self-Training Approach for Short Text Clustering

Abstract

Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.

Benchmarks

BenchmarkMethodologyMetrics
short-text-clustering-on-searchsnippetsSIF + Aut., Self-Train.
Acc: 77.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Self-Training Approach for Short Text Clustering | Papers | HyperAI