Command Palette
Search for a command to run...
DTrOCR: Decoder-only Transformer for Optical Character Recognition
Fujitake Masato

Abstract
Typical text recognition methods rely on an encoder-decoder structure, inwhich the encoder extracts features from an image, and the decoder producesrecognized text from these features. In this study, we propose a simpler andmore effective method for text recognition, known as the Decoder-onlyTransformer for Optical Character Recognition (DTrOCR). This method uses adecoder-only Transformer to take advantage of a generative language model thatis pre-trained on a large corpus. We examined whether a generative languagemodel that has been successful in natural language processing can also beeffective for text recognition in computer vision. Our experiments demonstratedthat DTrOCR outperforms current state-of-the-art methods by a large margin inthe recognition of printed, handwritten, and scene text in both English andChinese.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| handwritten-text-recognition-on-iam | DTrOCR 105M | CER: 2.38 |
| optical-character-recognition-on-benchmarking | DTrOCR 105M | Accuracy (%): 89.6 |
| optical-character-recognition-on-benchmarking | DTrOCR | Accuracy (%): 89.6 |
| scene-text-recognition-on-cute80 | DTrOCR 105M | Accuracy: 99.1 |
| scene-text-recognition-on-icdar2013 | DTrOCR 105M | Accuracy: 99.4 |
| scene-text-recognition-on-icdar2015 | DTrOCR 105M | Accuracy: 93.5 |
| scene-text-recognition-on-iiit5k | DTrOCR 105M | Accuracy: 99.6 |
| scene-text-recognition-on-svt | DTrOCR 105M | Accuracy: 98.9 |
| scene-text-recognition-on-svtp | DTrOCR 105M | Accuracy: 98.6 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.