HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

CODAH: An Adversarially Authored Question-Answer Dataset for Common Sense

Michael Chen; Mike D'Arcy; Alisa Liu; Jared Fernandez; Doug Downey

CODAH: An Adversarially Authored Question-Answer Dataset for Common Sense

Abstract

Commonsense reasoning is a critical AI capability, but it is difficult to construct challenging datasets that test common sense. Recent neural question answering systems, based on large pre-trained models of language, have already achieved near-human-level performance on commonsense knowledge benchmarks. These systems do not possess human-level common sense, but are able to exploit limitations of the datasets to achieve human-level scores. We introduce the CODAH dataset, an adversarially-constructed evaluation dataset for testing common sense. CODAH forms a challenging extension to the recently-proposed SWAG dataset, which tests commonsense knowledge using sentence-completion questions that describe situations observed in video. To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems. Workers are rewarded for submissions that models fail to answer correctly both before and after fine-tuning (in cross-validation). We create 2.8k questions via this procedure and evaluate the performance of multiple state-of-the-art question answering systems on our dataset. We observe a significant gap between human performance, which is 95.3%, and the performance of the best baseline accuracy of 67.5% by the BERT-Large model.

Code Repositories

Websail-NU/AQuA
Official
pytorch
iit-nlp-research/chatgpt-crawler
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
common-sense-reasoning-on-codahBERT Large
Accuracy: 69.6
question-answering-on-codahBERT Large
Accuracy: 69.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CODAH: An Adversarially Authored Question-Answer Dataset for Common Sense | Papers | HyperAI