8 months ago

Kim Geewook ; Hong Teakgyu ; Yim Moonbin ; Nam Jeongyeon ; Park Jinyoung ; Yim Jinyeong ; Hwang Wonseok ; Yun Sangdoo ; Han Dongyoon ; Park

Abstract

Understanding document images (e.g., invoices) is a core but challenging tasksince it requires complex functions such as reading text and a holisticunderstanding of the document. Current Visual Document Understanding (VDU)methods outsource the task of reading text to off-the-shelf Optical CharacterRecognition (OCR) engines and focus on the understanding task with the OCRoutputs. Although such OCR-based approaches have shown promising performance,they suffer from 1) high computational costs for using OCR; 2) inflexibility ofOCR models on languages or types of document; 3) OCR error propagation to thesubsequent process. To address these issues, in this paper, we introduce anovel OCR-free VDU model named Donut, which stands for Document understandingtransformer. As the first step in OCR-free VDU research, we propose a simplearchitecture (i.e., Transformer) with a pre-training objective (i.e.,cross-entropy loss). Donut is conceptually simple yet effective. Throughextensive experiments and analyses, we show a simple OCR-free VDU model, Donut,achieves state-of-the-art performances on various VDU tasks in terms of bothspeed and accuracy. In addition, we offer a synthetic data generator that helpsthe model pre-training to be flexible in various languages and domains. Thecode, trained model and synthetic data are available athttps://github.com/clovaai/donut.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Document Understanding

Multimodal

Computer Vision

Natural Language Processing

Multimodality

Computer Vision

Task/Problem

Kim Geewook ; Hong Teakgyu ; Yim Moonbin ; Nam Jeongyeon ; Park Jinyoung ; Yim Jinyeong ; Hwang Wonseok ; Yun Sangdoo ; Han Dongyoon ; Park

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Document Understanding

Multimodal

Computer Vision

Natural Language Processing

Multimodality

Computer Vision

Task/Problem

Kim Geewook ; Hong Teakgyu ; Yim Moonbin ; Nam Jeongyeon ; Park Jinyoung ; Yim Jinyeong ; Hwang Wonseok ; Yun Sangdoo ; Han Dongyoon ; Park

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

OCR-free Document Understanding Transformer

Kim Geewook ; Hong Teakgyu ; Yim Moonbin ; Nam Jeongyeon ; Park Jinyoung ; Yim Jinyeong ; Hwang Wonseok ; Yun Sangdoo ; Han Dongyoon ; Park1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

OCR-free Document Understanding Transformer

Kim Geewook ; Hong Teakgyu ; Yim Moonbin ; Nam Jeongyeon ; Park Jinyoung ; Yim Jinyeong ; Hwang Wonseok ; Yun Sangdoo ; Han Dongyoon ; Park1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

OCR-free Document Understanding Transformer

Kim Geewook ; Hong Teakgyu ; Yim Moonbin ; Nam Jeongyeon ; Park Jinyoung ; Yim Jinyeong ; Hwang Wonseok ; Yun Sangdoo ; Han Dongyoon ; Park1 more

Abstract

Build AI with AI

HyperAI Newsletters

Kim Geewook ; Hong Teakgyu ; Yim Moonbin ; Nam Jeongyeon ; Park Jinyoung ; Yim Jinyeong ; Hwang Wonseok ; Yun Sangdoo ; Han Dongyoon ; Park

Kim Geewook ; Hong Teakgyu ; Yim Moonbin ; Nam Jeongyeon ; Park Jinyoung ; Yim Jinyeong ; Hwang Wonseok ; Yun Sangdoo ; Han Dongyoon ; Park

Kim Geewook ; Hong Teakgyu ; Yim Moonbin ; Nam Jeongyeon ; Park Jinyoung ; Yim Jinyeong ; Hwang Wonseok ; Yun Sangdoo ; Han Dongyoon ; Park