HyperAIHyperAI

Command Palette

Search for a command to run...

SSTQA Semi-structured Tabular Question Answering Dataset

Join the Discord Community

*This dataset supports online use.Click here to jump.

SSTQA is a benchmark dataset for semi-structured table question answering tasks released in 2025 by Shanghai Jiao Tong University, Simon Fraser University, Tsinghua University and other institutions. The relevant paper results are "ST-Raptor: LLM-Powered Semi-Structured Table Question Answering", which aims to test the understanding and answering capabilities of large-scale language models and table question answering systems when faced with complex layouts in real tables (such as merged cells, hierarchical headers, multi-level nesting, etc.).

This dataset contains 102 complex, real-world tables and 764 corresponding questions, covering 19 representative real-world application scenarios. Table features include nested cells, multi-level headers, and irregular layouts, fully reflecting the structural complexity of real-world problems. Question-answer pairs are constructed through a combination of automatic generation and manual review, and are categorized into three difficulty levels: easy, medium, and hard. The dataset covers tasks ranging from direct retrieval to complex reasoning, ensuring diverse and challenging tasks.

This dataset addresses the problems of existing semi-structured datasets, such as small scale, simple structure, and disconnection from real applications. It has the characteristics of complex structure, rich scenarios, clear difficulty levels, and high-quality annotation. It is suitable for the training and evaluation of large multimodal models and table question-answering systems, and is an important benchmark for promoting table understanding and intelligent applications.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SSTQA Semi-structured Tabular Question Answering Dataset | Datasets | HyperAI