HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Yu Meng Yunyi Zhang Jiaxin Huang Yu Zhang Chao Zhang Jiawei Han

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Abstract

Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora. To account for potential hierarchical topic structures, hierarchical topic models generalize flat topic models by incorporating latent topic hierarchies into their generative modeling process. However, due to their purely unsupervised nature, the learned topic hierarchy often deviates from users' particular needs or interests. To guide the hierarchical topic discovery process with minimal user supervision, we propose a new task, Hierarchical Topic Mining, which takes a category tree described by category names only, and aims to mine a set of representative terms for each category from a text corpus to help a user comprehend his/her interested topics. We develop a novel joint tree and text embedding method along with a principled optimization procedure that allows simultaneous modeling of the category tree structure and the corpus generative process in the spherical space for effective category-representative term discovery. Our comprehensive experiments show that our model, named JoSH, mines a high-quality set of hierarchical topics with high efficiency and benefits weakly-supervised hierarchical text classification tasks.

Code Repositories

yumeng5/JoSH
Official

Benchmarks

BenchmarkMethodologyMetrics
topic-models-on-arxivJoSH
MACC: 83.24
Topic coherence@5: 0.0074
topic-models-on-nytJoSH
MACC: 90.91
Topic coherence@5: 0.0166

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding | Papers | HyperAI