HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Challenges in Data-to-Document Generation

Sam Wiseman; Stuart M. Shieber; Alexander M. Rush

Challenges in Data-to-Document Generation

Abstract

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

Code Repositories

KaijuML/rotowire-rg-metric
pytorch
Mentioned in GitHub
ratishsp/data2text-1
Mentioned in GitHub
harvardnlp/boxscore-data
Official
Mentioned in GitHub
harvardnlp/data2text
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
data-to-text-generation-on-rotowireEncoder-decoder + conditional copy
BLEU: 14.19
data-to-text-generation-on-rotowire-contentEncoder-decoder + conditional copy
BLEU: 14.49
DLD: 8.68%
data-to-text-generation-on-rotowire-content-1Encoder-decoder + conditional copy
Precision: 29.49%
Recall: 36.18%
data-to-text-generation-on-rotowire-relationEncoder-decoder + conditional copy
Precision: 74.80%
count: 23.72

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Challenges in Data-to-Document Generation | Papers | HyperAI