HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Muppet: Massive Multi-task Representations with Pre-Finetuning

Armen Aghajanyan Anchit Gupta Akshat Shrivastava Xilun Chen Luke Zettlemoyer Sonal Gupta

Muppet: Massive Multi-task Representations with Pre-Finetuning

Abstract

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

Benchmarks

BenchmarkMethodologyMetrics
abstractive-text-summarization-on-cnn-dailyMUPPET BART Large
ROUGE-1: 44.45
ROUGE-2: 21.25
ROUGE-L: 41.4
common-sense-reasoning-on-commonsenseqaMUPPET Roberta Large
Accuracy: 79.2
natural-language-inference-on-rteMUPPET Roberta Large
Accuracy: 92.8%
question-answering-on-boolqMUPPET Roberta Base
Accuracy: 83.8
question-answering-on-boolqMUPPET Roberta Large
Accuracy: 87.5
sentiment-analysis-on-sst-2-binaryMUPPET Roberta base
Accuracy: 96.7
sentiment-analysis-on-sst-2-binaryMUPPET Roberta Large
Accuracy: 97.4
text-summarization-on-gigawordMUPPET BART Large
ROUGE-1: 40.4
ROUGE-2: 20.54
ROUGE-L: 36.21
text-summarization-on-reddit-tifuMUPPET BART Large
ROUGE-1: 30.3
ROUGE-2: 11.25
ROUGE-L: 24.92

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Muppet: Massive Multi-task Representations with Pre-Finetuning | Papers | HyperAI