HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution

Yi Huang Buse Giledereli Abdullatif Köksal Arzucan Özgür Elif Ozkirimli

Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution

Abstract

Multi-label text classification is a challenging task because it requires capturing label dependencies. It becomes even more challenging when class distribution is long-tailed. Resampling and re-weighting are common approaches used for addressing the class imbalance problem, however, they are not effective when there is label dependency besides class imbalance because they result in oversampling of common labels. Here, we introduce the application of balancing loss functions for multi-label text classification. We perform experiments on a general domain dataset with 90 labels (Reuters-21578) and a domain-specific dataset from PubMed with 18211 labels. We find that a distribution-balanced loss function, which inherently addresses both the class imbalance and label linkage problems, outperforms commonly used loss functions. Distribution balancing methods have been successfully used in the image recognition field. Here, we show their effectiveness in natural language processing. Source code is available at https://github.com/Roche/BalancedLossNLP.

Code Repositories

blessu/balancedlossnlp
Official
pytorch
Mentioned in GitHub
Roche/BalancedLossNLP
Official
pytorch
Mentioned in GitHub

Benchmarks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution | Papers | HyperAI