5 months ago

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Zhiyuan Zhao Hengrui Kang Bin Wang Conghui He

Abstract

Document Layout Analysis is crucial for real-world document understandingsystems, but it encounters a challenging trade-off between speed and accuracy:multimodal methods leveraging both text and visual features achieve higheraccuracy but suffer from significant latency, whereas unimodal methods relyingsolely on visual features offer faster processing speeds at the expense ofaccuracy. To address this dilemma, we introduce DocLayout-YOLO, a novelapproach that enhances accuracy while maintaining speed advantages throughdocument-specific optimizations in both pre-training and model design. Forrobust document pre-training, we introduce the Mesh-candidate BestFitalgorithm, which frames document synthesis as a two-dimensional bin packingproblem, generating the large-scale, diverse DocSynth-300K dataset.Pre-training on the resulting DocSynth-300K dataset significantly improvesfine-tuning performance across various document types. In terms of modeloptimization, we propose a Global-to-Local Controllable Receptive Module thatis capable of better handling multi-scale variations of document elements.Furthermore, to validate performance across different document types, weintroduce a complex and challenging benchmark named DocStructBench. Extensiveexperiments on downstream datasets demonstrate that DocLayout-YOLO excels inboth speed and accuracy. Code, data, and models are available athttps://github.com/opendatalab/DocLayout-YOLO.

Code Repositories

opendatalab/PDF-Extract-Kit

pytorch

Mentioned in GitHub

opendatalab/DocLayout-YOLO

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
document-layout-analysis-on-d4la	DocLayout-YOLO	mAP: 70.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette