HyperAI

Abstract

Official Gazettes are a rich source of relevant information to the public. Their careful examination may lead to the detection of frauds and irregularities that may prevent mismanagement of public funds. This paper presents a dataset composed of documents from the Official Gazette of the Federal District, containing both samples with document source annotation and unlabeled ones. We train, evaluate and compare a transfer learning based model that uses ULMFiT with traditional bag-of-words models that use SVM and Naive Bayes as classifiers. We find the SVM to be competitive, its performance being marginally worse than the ULMFiT while having much faster train and inference time and being less computationally expensive. Finally, we conduct ablation analysis to assess the performance impact of the ULMFiT parts.

Abstract

Marcelo Magalhães Silva de Sousa Teófilo Emidio de Campos Pedro Henrique Luz de Araujo

Abstract

Build AI with AI

HyperAI Newsletters

Marcelo Magalhães Silva de Sousa Teófilo Emidio de Campos Pedro Henrique Luz de Araujo

Abstract

Build AI with AI

HyperAI Newsletters

Marcelo Magalhães Silva de Sousa Teófilo Emidio de Campos Pedro Henrique Luz de Araujo

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Inferring the source of official texts: can SVM beat ULMFiT?

Marcelo Magalhães Silva de Sousa Teófilo Emidio de Campos Pedro Henrique Luz de Araujo

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Inferring the source of official texts: can SVM beat ULMFiT?

Marcelo Magalhães Silva de Sousa Teófilo Emidio de Campos Pedro Henrique Luz de Araujo

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Inferring the source of official texts: can SVM beat ULMFiT?

Marcelo Magalhães Silva de Sousa Teófilo Emidio de Campos Pedro Henrique Luz de Araujo

Abstract

Build AI with AI

HyperAI Newsletters