HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

DTrOCR: Decoder-only Transformer for Optical Character Recognition

Fujitake Masato

DTrOCR: Decoder-only Transformer for Optical Character Recognition

Abstract

Typical text recognition methods rely on an encoder-decoder structure, inwhich the encoder extracts features from an image, and the decoder producesrecognized text from these features. In this study, we propose a simpler andmore effective method for text recognition, known as the Decoder-onlyTransformer for Optical Character Recognition (DTrOCR). This method uses adecoder-only Transformer to take advantage of a generative language model thatis pre-trained on a large corpus. We examined whether a generative languagemodel that has been successful in natural language processing can also beeffective for text recognition in computer vision. Our experiments demonstratedthat DTrOCR outperforms current state-of-the-art methods by a large margin inthe recognition of printed, handwritten, and scene text in both English andChinese.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
handwritten-text-recognition-on-iamDTrOCR 105M
CER: 2.38
optical-character-recognition-on-benchmarkingDTrOCR 105M
Accuracy (%): 89.6
optical-character-recognition-on-benchmarkingDTrOCR
Accuracy (%): 89.6
scene-text-recognition-on-cute80DTrOCR 105M
Accuracy: 99.1
scene-text-recognition-on-icdar2013DTrOCR 105M
Accuracy: 99.4
scene-text-recognition-on-icdar2015DTrOCR 105M
Accuracy: 93.5
scene-text-recognition-on-iiit5kDTrOCR 105M
Accuracy: 99.6
scene-text-recognition-on-svtDTrOCR 105M
Accuracy: 98.9
scene-text-recognition-on-svtpDTrOCR 105M
Accuracy: 98.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
DTrOCR: Decoder-only Transformer for Optical Character Recognition | Papers | HyperAI