HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Adversarial Self-Attention for Language Understanding

Hongqiu Wu Ruixue Ding Hai Zhao Pengjun Xie Fei Huang Min Zhang

Adversarial Self-Attention for Language Understanding

Abstract

Deep neural models (e.g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness. This paper advances the self-attention mechanism to its robust variant for Transformer-based pre-trained language models (e.g. BERT). We propose \textit{Adversarial Self-Attention} mechanism (ASA), which adversarially biases the attentions to effectively suppress the model reliance on features (e.g. specific keywords) and encourage its exploration of broader semantics. We conduct a comprehensive evaluation across a wide range of tasks for both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gains compared to naive training for longer steps. For fine-tuning, ASA-empowered models outweigh naive models by a large margin considering both generalization and robustness.

Code Repositories

gingasan/adversarialsa
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
machine-reading-comprehension-on-dreamASA + RoBERTa
Accuracy: 69.2
machine-reading-comprehension-on-dreamASA + BERT-base
Accuracy: 64.3
named-entity-recognition-on-wnut-2017ASA + RoBERTa
F1: 57.3
named-entity-recognition-on-wnut-2017ASA + BERT-base
F1: 49.8
natural-language-inference-on-multinliASA + BERT-base
Matched: 85
natural-language-inference-on-multinliASA + RoBERTa
Matched: 88
natural-language-inference-on-qnliASA + RoBERTa
Accuracy: 93.6%
natural-language-inference-on-qnliASA + BERT-base
Accuracy: 91.4%
paraphrase-identification-on-quora-questionASA + BERT-base
F1: 72.3
paraphrase-identification-on-quora-questionASA + RoBERTa
F1: 73.7
semantic-textual-similarity-on-sts-benchmarkASA + RoBERTa
Spearman Correlation: 0.892
semantic-textual-similarity-on-sts-benchmarkASA + BERT-base
Spearman Correlation: 0.865
sentiment-analysis-on-sst-2-binaryASA + BERT-base
Accuracy: 94.1
sentiment-analysis-on-sst-2-binaryASA + RoBERTa
Accuracy: 96.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Adversarial Self-Attention for Language Understanding | Papers | HyperAI