HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs

Lokesh Mishra; Sohayl Dhibi; Yusik Kim; Cesar Berrospi Ramis; Shubham Gupta; Michele Dolfi; Peter Staar

Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs

Abstract

Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuable quantitative information through tables. Unfortunately, extracting this information is difficult due to high variability in the table structure as well as content. We propose Statements, a novel domain agnostic data structure for extracting quantitative facts and related information. We propose translating tables to statements as a new supervised deep-learning universal information extraction task. We introduce SemTabNet - a dataset of over 100K annotated tables. Investigating a family of T5-based Statement Extraction Models, our best model generates statements which are 82% similar to the ground-truth (compared to baseline of 21%). We demonstrate the advantages of statements by applying our model to over 2700 tables from ESG reports. The homogeneous nature of statements permits exploratory data analysis on expansive information found in large collections of ESG reports.

Code Repositories

ds4sd/semtabnet
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
information-extraction-on-semtabnetT5
average Tree Similarity Score: 81.76

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs | Papers | HyperAI