3 months ago

Adversarial Self-Attention for Language Understanding

Hongqiu Wu Ruixue Ding Hai Zhao Pengjun Xie Fei Huang Min Zhang

Abstract

Deep neural models (e.g. Transformer) naturally learn spurious features, which create a ``shortcut'' between the labels and inputs, thus impairing the generalization and robustness. This paper advances the self-attention mechanism to its robust variant for Transformer-based pre-trained language models (e.g. BERT). We propose \textit{Adversarial Self-Attention} mechanism (ASA), which adversarially biases the attentions to effectively suppress the model reliance on features (e.g. specific keywords) and encourage its exploration of broader semantics. We conduct a comprehensive evaluation across a wide range of tasks for both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gains compared to naive training for longer steps. For fine-tuning, ASA-empowered models outweigh naive models by a large margin considering both generalization and robustness.

Code Repositories

gingasan/adversarialsa

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
machine-reading-comprehension-on-dream	ASA + RoBERTa	Accuracy: 69.2
machine-reading-comprehension-on-dream	ASA + BERT-base	Accuracy: 64.3
named-entity-recognition-on-wnut-2017	ASA + RoBERTa	F1: 57.3
named-entity-recognition-on-wnut-2017	ASA + BERT-base	F1: 49.8
natural-language-inference-on-multinli	ASA + BERT-base	Matched: 85
natural-language-inference-on-multinli	ASA + RoBERTa	Matched: 88
natural-language-inference-on-qnli	ASA + RoBERTa	Accuracy: 93.6%
natural-language-inference-on-qnli	ASA + BERT-base	Accuracy: 91.4%
paraphrase-identification-on-quora-question	ASA + BERT-base	F1: 72.3
paraphrase-identification-on-quora-question	ASA + RoBERTa	F1: 73.7
semantic-textual-similarity-on-sts-benchmark	ASA + RoBERTa	Spearman Correlation: 0.892
semantic-textual-similarity-on-sts-benchmark	ASA + BERT-base	Spearman Correlation: 0.865
sentiment-analysis-on-sst-2-binary	ASA + BERT-base	Accuracy: 94.1
sentiment-analysis-on-sst-2-binary	ASA + RoBERTa	Accuracy: 96.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette