HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry

Tschirschwitz David ; Rodehorst Volker

CISOL: An Open and Extensible Dataset for Table Structure Recognition in
  the Construction Industry

Abstract

Reproducibility and replicability are critical pillars of empirical research,particularly in machine learning, where they depend not only on theavailability of models, but also on the datasets used to train and evaluatethose models. In this paper, we introduce the Construction Industry SteelOrdering List (CISOL) dataset, which was developed with a focus on transparencyto ensure reproducibility, replicability, and extensibility. CISOL provides avaluable new research resource and highlights the importance of having diversedatasets, even in niche application domains such as table extraction in civilengineering. CISOL is unique in that it contains real-world civil engineering documentsfrom industry, making it a distinctive contribution to the field. The datasetcontains more than 120,000 annotated instances in over 800 document images,positioning it as a medium-sized dataset that provides a robust foundation forTable Structure Recognition (TSR) and Table Detection (TD) tasks. Benchmarking results show that CISOL achieves 67.22 mAP@0.5:0.95:0.05 usingthe YOLOv8 model, outperforming the TSR-specific TATR model. This highlightsthe effectiveness of CISOL as a benchmark for advancing TSR, especially inspecialized domains.

Benchmarks

BenchmarkMethodologyMetrics
object-detection-on-cisol-track-a-td-tsrYOLO v8.1m
mAP@0.5:0.95:0.05: 67.22
object-detection-on-cisol-track-b-tsr-onlyYOLO v8.1m
mAP@0.5:0.95:0.05: 61.39

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry | Papers | HyperAI