3 months ago

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Minghao Li Tengchao Lv Jingye Chen Lei Cui Yijuan Lu Dinei Florencio Cha Zhang Zhoujun Li Furu Wei

Abstract

Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at \url{https://aka.ms/trocr}.

Code Repositories

oleehyo/texteller

paddle

Mentioned in GitHub

pwc-1/Paper-10/tree/main/trocr

mindspore

huggingface/transformers

pytorch

Mentioned in GitHub

pwc-1/Paper-9/tree/main/1/trocr

mindspore

microsoft/unilm/tree/master/trocr

Official

pytorch

MindCode-4/code-5/tree/main/trocr

mindspore

d-gurgurov/im2latex

pytorch

Mentioned in GitHub

prathameshza/TrOCR_FineTuning

Benchmarks

Benchmark	Methodology	Metrics
handwritten-text-recognition-on-iam	TrOCR-small 62M	CER: 4.22
handwritten-text-recognition-on-iam	TrOCR-large 558M	CER: 2.89
handwritten-text-recognition-on-iam	TrOCR-base 334M	CER: 3.42
handwritten-text-recognition-on-iam-line	TrOCR	Test CER: 3.4 Test WER: -
handwritten-text-recognition-on-lam-line	TrOCR	Test CER: 3.6 Test WER: 11.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Minghao Li Tengchao Lv Jingye Chen Lei Cui Yijuan Lu Dinei Florencio Cha Zhang Zhoujun Li Furu Wei

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters