HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

LeNER-Br: a Dataset for Named Entity Recognition in Brazilian Legal Text

{Teófilo E. de Campos Samuel Couto Pedro H. Luz de Araujo Paulo Bermejo Matheus Stauffer Renato R. R. de Oliveira}

Abstract

Named entity recognition systems have the untapped potential to extract information from legal documents, which can improveinformation retrieval and decision-making processes. In this paper, a dataset for named entity recognition in Brazilian legal documents is presented. Unlike other Portuguese language datasets, this dataset is composed entirely of legal documents. In addition to tags for persons, locations, time entities and organizations, the dataset contains specific tags for law and legal cases entities. To establish a set of baseline results, we first performed experiments on another Portuguese dataset: Paramopama. This evaluation demonstrate that LSTM-CRF gives results that are significantly better than those previously reported. We then retrained LSTM-CRF, on our dataset and obtained F 1 scores of 97.04% and 88.82% for Legislation and Legal case entities, respectively.These results show the viability of the proposed dataset for legal applications.

Benchmarks

BenchmarkMethodologyMetrics
named-entity-recognition-on-lener-brLSTM-CRF
Micro F1 (Exact Span): 0.8661
Micro F1 (Tokens): 0.9253

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LeNER-Br: a Dataset for Named Entity Recognition in Brazilian Legal Text | Papers | HyperAI