HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions

Qiang Ning Hao Wu Rujun Han Nanyun Peng Matt Gardner Dan Roth

TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions

Abstract

A critical part of reading is being able to understand the temporal relationships between events described in a passage of text, even when those relationships are not explicitly stated. However, current machine reading comprehension benchmarks have practically no questions that test temporal phenomena, so systems trained on these benchmarks have no capacity to answer questions such as "what happened before/after [some event]?" We introduce TORQUE, a new English reading comprehension benchmark built on 3.2k news snippets with 21k human-generated questions querying temporal relationships. Results show that RoBERTa-large achieves an exact-match score of 51% on the test set of TORQUE, about 30% behind human performance.

Benchmarks

BenchmarkMethodologyMetrics
question-answering-on-torqueRoBERTa-large
C: 34.5
EM: 51.1
F1: 75.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp