HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

OCR-free Document Understanding Transformer

OCR-free Document Understanding Transformer

Abstract

Understanding document images (e.g., invoices) is a core but challenging tasksince it requires complex functions such as reading text and a holisticunderstanding of the document. Current Visual Document Understanding (VDU)methods outsource the task of reading text to off-the-shelf Optical CharacterRecognition (OCR) engines and focus on the understanding task with the OCRoutputs. Although such OCR-based approaches have shown promising performance,they suffer from 1) high computational costs for using OCR; 2) inflexibility ofOCR models on languages or types of document; 3) OCR error propagation to thesubsequent process. To address these issues, in this paper, we introduce anovel OCR-free VDU model named Donut, which stands for Document understandingtransformer. As the first step in OCR-free VDU research, we propose a simplearchitecture (i.e., Transformer) with a pre-training objective (i.e.,cross-entropy loss). Donut is conceptually simple yet effective. Throughextensive experiments and analyses, we show a simple OCR-free VDU model, Donut,achieves state-of-the-art performances on various VDU tasks in terms of bothspeed and accuracy. In addition, we offer a synthetic data generator that helpsthe model pre-training to be flexible in various languages and domains. Thecode, trained model and synthetic data are available athttps://github.com/clovaai/donut.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
document-image-classification-on-rvl-cdipDonut
Accuracy: 95.3%
key-value-pair-extraction-on-rfund-enDonut
key-value pair F1: 24.54
key-value-pair-extraction-on-sibrDonut
key-value pair F1: 17.26
visual-question-answering-on-docvqa-testDonut
ANLS: 0.675

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
OCR-free Document Understanding Transformer | Papers | HyperAI