7 months ago

Te\'ofilo Em\'\idio de Campos Pedro Henrique Luz de Araujo Nilton Correia da Silva Fabricio Ataides Braz

Abstract

This paper describes VICTOR, a novel dataset built from Brazil{'}s Supreme Court digitalized legal documents, composed of more than 45 thousand appeals, which includes roughly 692 thousand documents{---}about 4.6 million pages. The dataset contains labeled text data and supports two types of tasks: document type classification; and theme assignment, a multilabel problem. We present baseline results using bag-of-words models, convolutional neural networks, recurrent neural networks and boosting algorithms. We also experiment using linear-chain Conditional Random Fields to leverage the sequential nature of the lawsuits, which we find to lead to improvements on document type classification. Finally we compare a theme classification approach where we use domain knowledge to filter out the less informative document pages to the default one where we use all pages. Contrary to the Court experts{'} expectations, we find that using all available data is the better method. We make the dataset available in three versions of different sizes and contents to encourage explorations of better models and techniques.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Natural Language Processing

Dataset

Multi-Task Learning

Te\'ofilo Em\'\idio de Campos Pedro Henrique Luz de Araujo Nilton Correia da Silva Fabricio Ataides Braz

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Natural Language Processing

Dataset

Multi-Task Learning

Te\'ofilo Em\'\idio de Campos Pedro Henrique Luz de Araujo Nilton Correia da Silva Fabricio Ataides Braz

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

VICTOR: a Dataset for Brazilian Legal Documents Classification | Papers | HyperAI

Command Palette

VICTOR: a Dataset for Brazilian Legal Documents Classification

Te\'ofilo Em\'\idio de Campos Pedro Henrique Luz de Araujo Nilton Correia da Silva Fabricio Ataides Braz

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

VICTOR: a Dataset for Brazilian Legal Documents Classification

Te\'ofilo Em\'\idio de Campos Pedro Henrique Luz de Araujo Nilton Correia da Silva Fabricio Ataides Braz

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

VICTOR: a Dataset for Brazilian Legal Documents Classification

Te\'ofilo Em\'\idio de Campos Pedro Henrique Luz de Araujo Nilton Correia da Silva Fabricio Ataides Braz

Abstract

Build AI with AI

HyperAI Newsletters