5 months ago

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Xu Yiheng ; Lv Tengchao ; Cui Lei ; Wang Guoxin ; Lu Yijuan ; Florencio Dinei ; Zhang Cha ; Wei Furu

Abstract

Multimodal pre-training with text, layout, and image has achieved SOTAperformance for visually-rich document understanding tasks recently, whichdemonstrates the great potential for joint learning across differentmodalities. In this paper, we present LayoutXLM, a multimodal pre-trained modelfor multilingual document understanding, which aims to bridge the languagebarriers for visually-rich document understanding. To accurately evaluateLayoutXLM, we also introduce a multilingual form understanding benchmarkdataset named XFUND, which includes form understanding samples in 7 languages(Chinese, Japanese, Spanish, French, Italian, German, Portuguese), andkey-value pairs are manually labeled for each language. Experiment results showthat the LayoutXLM model has significantly outperformed the existing SOTAcross-lingual pre-trained models on the XFUND dataset. The pre-trainedLayoutXLM model and the XFUND dataset are publicly available athttps://aka.ms/layoutxlm.

Code Repositories

microsoft/unilm/tree/master/layoutxlm

Official

pytorch

facebookresearch/data2vec_vision

pytorch

Mentioned in GitHub

2024-MindSpore-1/Code3/tree/main/VI-LayoutXLM

mindspore

PaddlePaddle/PaddleNLP/tree/develop/paddlenlp/transformers/layoutxlm

paddle

huggingface/transformers

pytorch

Mentioned in GitHub

PaddlePaddle/PaddleOCR

paddle

Benchmarks

Benchmark	Methodology	Metrics
document-image-classification-on-rvl-cdip	LayoutXLM	Accuracy: 95.21%
key-value-pair-extraction-on-rfund-en	LayoutXLM_base	key-value pair F1: 53.98
key-value-pair-extraction-on-sibr	LayoutXLM	key-value pair F1: 70.45

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Xu Yiheng ; Lv Tengchao ; Cui Lei ; Wang Guoxin ; Lu Yijuan ; Florencio Dinei ; Zhang Cha ; Wei Furu

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters