5 months ago

Graph Convolutions Enrich the Self-Attention in Transformers!

Choi Jeongwhan ; Wi Hyowon ; Kim Jayoung ; Shin Yehjin ; Lee Kookjin ; Trask Nathaniel ; Park Noseong

Abstract

Transformers, renowned for their self-attention mechanism, have achievedstate-of-the-art performance across various tasks in natural languageprocessing, computer vision, time-series modeling, etc. However, one of thechallenges with deep Transformer models is the oversmoothing problem, whererepresentations across layers converge to indistinguishable values, leading tosignificant performance degradation. We interpret the original self-attentionas a simple graph filter and redesign it from a graph signal processing (GSP)perspective. We propose a graph-filter-based self-attention (GFSA) to learn ageneral yet effective one, whose complexity, however, is slightly larger thanthat of the original self-attention mechanism. We demonstrate that GFSAimproves the performance of Transformers in various fields, including computervision, natural language processing, graph-level tasks, speech recognition, andcode classification.

Code Repositories

jeongwhanchoi/gfsa

Official

jax

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
defect-detection-on-codexglue-devign	PLBART + GFSA	Accuracy: 62.96
defect-detection-on-codexglue-devign	CodeT5-small + GFSA	Accuracy: 63.69
defect-detection-on-codexglue-devign	CodeT5-small	Accuracy: 63.25
defect-detection-on-codexglue-devign	RoBERTa + GFSA	Accuracy: 64.39
defect-detection-on-codexglue-devign	PLBART	Accuracy: 62.63
defect-detection-on-codexglue-devign	CodeT5-base	Accuracy: 63.51
defect-detection-on-codexglue-devign	CodeBERT + GFSA	Accuracy: 64.49
defect-detection-on-codexglue-devign	RoBERTa	Accuracy: 62.88
defect-detection-on-codexglue-devign	CodeBERT	Accuracy: 64.31
defect-detection-on-codexglue-devign	CodeT5-base + GFSA	Accuracy: 64.75
graph-regression-on-pcqm4m-lsc	Graphormer + GFSA	Validation MAE: 0.1193
graph-regression-on-pcqm4mv2-lsc	Graphormer + GFSA	Validation MAE: 0.0860
image-classification-on-imagenet	CaiT-S + GFSA	Top 1 Accuracy: 82.8%
image-classification-on-imagenet	Swin-S + GFSA	Top 1 Accuracy: 83%
image-classification-on-imagenet	DeiT-S-12 + GFSA	Top 1 Accuracy: 81.1%
image-classification-on-imagenet	DeiT-S-24 + GFSA	Top 1 Accuracy: 81.5%
speech-recognition-on-librispeech-100h-test	Branchformer + GFSA	Word Error Rate (WER): 9.6
speech-recognition-on-librispeech-100h-test-1	Branchformer + GFSA	Word Error Rate (WER): 22.25
speech-recognition-on-librispeech-test-clean	Branchformer + GFSA	Word Error Rate (WER): 2.11
speech-recognition-on-librispeech-test-other	Branchformer + GFSA	Word Error Rate (WER): 4.94

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette