HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

UnifiedQA: Crossing Format Boundaries With a Single QA System

Daniel Khashabi; Sewon Min; Tushar Khot; Ashish Sabharwal; Oyvind Tafjord; Peter Clark; Hannaneh Hajishirzi

UnifiedQA: Crossing Format Boundaries With a Single QA System

Abstract

Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such boundaries are artificial and perhaps unnecessary, given the reasoning abilities we seek to teach are not governed by the format. As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats. UnifiedQA performs on par with 9 different models that were trained on individual datasets themselves. Even when faced with 12 unseen datasets of observed formats, UnifiedQA performs surprisingly well, showing strong generalization from its out-of-format training data. Finally, simply fine-tuning this pre-trained QA model into specialized models results in a new state of the art on 6 datasets, establishing UnifiedQA as a strong starting point for building QA systems.

Code Repositories

allenai/unifiedqa
Official
pytorch
Mentioned in GitHub
facebookresearch/metaicl
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
common-sense-reasoning-on-commonsenseqaUnifiedQA 11B (fine-tuned)
Accuracy: 79.1
common-sense-reasoning-on-commonsenseqaUnifiedQA 440M (fine-tuned)
Accuracy: 64
common-sense-reasoning-on-commonsenseqaT5-XXL 11B (fine-tuned)
Accuracy: 78.1
common-sense-reasoning-on-commonsenseqaUnifiedQA 11B (zero-shot)
Accuracy: 76.2
common-sense-reasoning-on-commonsenseqaBART-large 440M (fine-tuned)
Accuracy: 62.5
common-sense-reasoning-on-winograndeUnified QA 406M (fine-tuned)
Accuracy: 73.3
common-sense-reasoning-on-winograndeUnifiedQA 11B (fine-tuned)
Accuracy: 89.4
multi-task-language-understanding-on-mmluGPT 3
Average (%): 48.9
question-answering-on-openbookqaUnifiedQA 11B
Accuracy: 87.2
question-answering-on-piqaUnifiedQA 3B
Accuracy: 85.3
question-answering-on-social-iqaUnifiedQA 3B
Accuracy: 79.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
UnifiedQA: Crossing Format Boundaries With a Single QA System | Papers | HyperAI