HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

Luo Chuwei ; Cheng Changxu ; Zheng Qi ; Yao Cong

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

Abstract

Visual information extraction (VIE) plays an important role in DocumentIntelligence. Generally, it is divided into two tasks: semantic entityrecognition (SER) and relation extraction (RE). Recently, pre-trained modelsfor documents have achieved substantial progress in VIE, particularly in SER.However, most of the existing models learn the geometric representation in animplicit way, which has been found insufficient for the RE task since geometricinformation is especially crucial for RE. Moreover, we reveal another factorthat limits the performance of RE lies in the objective gap between thepre-training phase and the fine-tuning phase for RE. To tackle these issues, wepropose in this paper a multi-modal framework, named GeoLayoutLM, for VIE.GeoLayoutLM explicitly models the geometric relations in pre-training, which wecall geometric pre-training. Geometric pre-training is achieved by threespecially designed geometry-related pre-training tasks. Additionally, novelrelation heads, which are pre-trained by the geometric pre-training tasks andfine-tuned for RE, are elaborately designed to enrich and enhance the featurerepresentation. According to extensive experiments on standard VIE benchmarks,GeoLayoutLM achieves highly competitive scores in the SER task andsignificantly outperforms the previous state-of-the-arts for RE (\eg, the F1score of RE on FUNSD is boosted from 80.35\% to 89.45\%). The code and modelsare publicly available athttps://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/GeoLayoutLM

Code Repositories

alibabaresearch/advancedliteratemachinery
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
entity-linking-on-funsdGeoLayoutLM
F1: 89.45
key-information-extraction-on-cordGeoLayoutLM
F1: 97.97
key-value-pair-extraction-on-rfund-enGeoLayoutLM
key-value pair F1: 69.03
relation-extraction-on-funsdLayoutLMv3 large
F1: 80.35
relation-extraction-on-funsdGeoLayoutLM
F1: 89.45
semantic-entity-labeling-on-funsdGeoLayoutLM
F1: 92.86

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction | Papers | HyperAI