HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Regularization for Long Named Entity Recognition

Minbyul Jeong Jaewoo Kang

Regularization for Long Named Entity Recognition

Abstract

When performing named entity recognition (NER), entity length is variable and dependent on a specific domain or dataset. Pre-trained language models (PLMs) are used to solve NER tasks and tend to be biased toward dataset patterns such as length statistics, surface form, and skewed class distribution. These biases hinder the generalization ability of PLMs, which is necessary to address many unseen mentions in real-world situations. We propose a novel debiasing method RegLER to improve predictions for entities of varying lengths. To close the gap between evaluation and real-world situations, we evaluated PLMs on partitioned benchmark datasets containing unseen mention sets. Here, RegLER shows significant improvement over long-named entities that can predict through debiasing on conjunction or special characters within entities. Furthermore, there is a severe class imbalance in most NER datasets, causing easy-negative examples to dominate during training, such as "The". Our approach alleviates skewed class distribution by reducing the influence of easy-negative examples. Extensive experiments on the biomedical and general domains demonstrated the generalization capabilities of our method. To facilitate reproducibility and future work, we release our code."https://github.com/minstar/RegLER"

Code Repositories

minstar/PMI
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
named-entity-recognition-on-wnut-2017BiLSTMCRFBP
F1: 42.3
named-entity-recognition-on-wnut-2017BERT + RegLER
F1: 58.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Regularization for Long Named Entity Recognition | Papers | HyperAI