HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

PubLayNet: largest dataset ever for document layout analysis

Xu Zhong; Jianbin Tang; Antonio Jimeno Yepes

PubLayNet: largest dataset ever for document layout analysis

Abstract

Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing over 360 thousand document images, where typical document layout elements are annotated. The experiments demonstrate that deep neural networks trained on PubLayNet accurately recognize the layout of scientific articles. The pre-trained models are also a more effective base mode for transfer learning on a different document domain. We release the dataset (https://github.com/ibm-aur-nlp/PubLayNet) to support development and evaluation of more advanced models for document layout analysis.

Code Repositories

ibm-aur-nlp/PubLayNet
Official
Mentioned in GitHub
pisalore/FRCNN_teX-annotator
Mentioned in GitHub
phamquiluan/publaynet
pytorch
Mentioned in GitHub
ibm-aur-nlp/PubTabNet
Mentioned in GitHub
adlnlp/doc_gcn
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
document-layout-analysis-on-publaynet-valFaster RCNN
Figure: 0.937
List: 0.883
Overall: 0.902
Table: 0.954
Text: 0.910
Title: 0.826
document-layout-analysis-on-publaynet-valMask RCNN
Figure: 0.949
List: 0.886
Overall: 0.910
Table: 0.960
Text: 0.916
Title: 0.840

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PubLayNet: largest dataset ever for document layout analysis | Papers | HyperAI