Command Palette
Search for a command to run...
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Zhang Chong ; Guo Ya ; Tu Yi ; Chen Huan ; Tang Jinyang ; Zhu Huijia ; Zhang Qi ; Gui Tao

Abstract
Recent advances in multimodal pre-trained models have significantly improvedinformation extraction from visually-rich documents (VrDs), in which namedentity recognition (NER) is treated as a sequence-labeling task of predictingthe BIO entity tags for tokens, following the typical setting of NLP. However,BIO-tagging scheme relies on the correct order of model inputs, which is notguaranteed in real-world NER on scanned VrDs where text are recognized andarranged by OCR systems. Such reading order issue hinders the accurate markingof entities by BIO-tagging scheme, making it impossible for sequence-labelingmethods to predict correct named entities. To address the reading order issue,we introduce Token Path Prediction (TPP), a simple prediction head to predictentity mentions as token sequences within documents. Alternative to tokenclassification, TPP models the document layout as a complete directed graph oftokens, and predicts token paths within the graph as entities. For betterevaluation of VrD-NER systems, we also propose two revised benchmark datasetsof NER on scanned documents which can reflect real-world scenarios. Experimentresults demonstrate the effectiveness of our method, and suggest its potentialto be a universal solution to various information extraction tasks ondocuments.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| entity-linking-on-funsd | TPP (LayoutMask) | F1: 79.20 |
| key-information-extraction-on-cord | TPP (LayoutMask) | F1: 96.92 |
| key-value-pair-extraction-on-rfund-en | TPP (LayoutLMv3_base) | key-value pair F1: 50.27 |
| named-entity-recognition-ner-on-cord-r | TPP (LayoutLMv3) | F1: 91.85 |
| named-entity-recognition-ner-on-cord-r | TPP (LayoutMask) | F1: 89.34 |
| named-entity-recognition-ner-on-funsd-r | TPP (LayoutLMv3) | F1: 80.40 |
| named-entity-recognition-ner-on-funsd-r | TPP (LayoutMask) | F1: 78.19 |
| reading-order-detection-on-readingbank | TPP (LayoutMask) | Average Page-level BLEU: 98.16 Average Relative Distance (ARD): 0.37 |
| reading-order-detection-on-roor | TPP (LayoutLMv3-base) | Segment-level F1: 42.96 |
| relation-extraction-on-funsd | TPP (LayoutMask) | F1: 79.20 |
| semantic-entity-labeling-on-funsd | TPP (LayoutMask) | F1: 85.16 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.