HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Rafał Powalski Łukasz Borchmann Dawid Jurkiewicz Tomasz Dwojak Michał Pietruszka Gabriela Pałka

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Abstract

We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a decoder capable of unifying a variety of problems involving natural language. The layout is represented as an attention bias and complemented with contextualized visual information, while the core of our model is a pretrained encoder-decoder Transformer. Our novel approach achieves state-of-the-art results in extracting information from documents and answering questions which demand layout understanding (DocVQA, CORD, SROIE). At the same time, we simplify the process by employing an end-to-end model.

Code Repositories

uakarsh/TiLT-Implementation
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
document-image-classification-on-rvl-cdipTILT-Base
Accuracy: 95.25%
document-image-classification-on-rvl-cdipTILT-Large
Accuracy: 95.52%
visual-question-answering-on-docvqa-testTILT-Large
ANLS: 0.8705
visual-question-answering-on-docvqa-testTILT-Base
ANLS: 0.8392
visual-question-answering-vqa-onTILT-Large
ANLS: 61.20

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer | Papers | HyperAI