HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

Shehzadi Tahira ; Hashmi Khurram Azeem ; Stricker Didier ; Liwicki Marcus ; Afzal Muhammad Zeshan

Bridging the Performance Gap between DETR and R-CNN for Graphical Object
  Detection in Document Images

Abstract

This paper takes an important step in bridging the performance gap betweenDETR and R-CNN for graphical object detection. Existing graphical objectdetection approaches have enjoyed recent enhancements in CNN-based objectdetection methods, achieving remarkable progress. Recently, Transformer-baseddetectors have considerably boosted the generic object detection performance,eliminating the need for hand-crafted features or post-processing steps such asNon-Maximum Suppression (NMS) using object queries. However, the effectivenessof such enhanced transformer-based detection algorithms has yet to be verifiedfor the problem of graphical object detection. Essentially, inspired by thelatest advancements in the DETR, we employ the existing detection transformerwith few modifications for graphical object detection. We modify object queriesin different ways, using points, anchor boxes and adding positive and negativenoise to the anchors to boost performance. These modifications allow for betterhandling of objects with varying sizes and aspect ratios, more robustness tosmall variations in object positions and sizes, and improved imagediscrimination between objects and non-objects. We evaluate our approach on thefour graphical datasets: PubTables, TableBank, NTable and PubLaynet. Uponintegrating query modifications in the DETR, we outperform prior works andachieve new state-of-the-art results with the mAP of 96.9\%, 95.7\% and 99.3\%on TableBank, PubLaynet, PubTables, respectively. The results from extensiveablations show that transformer-based methods are more effective for documentanalysis analogous to other applications. We hope this study draws moreattention to the research of using detection transformers in document imageanalysis.

Benchmarks

BenchmarkMethodologyMetrics
document-layout-analysis-on-publaynet-valDETR
Figure: 0.975
List: 0.964
Overall: 0.957
Table: 0.981
Text: 0.947
Title: 0.918

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images | Papers | HyperAI