HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Graph Convolutions Enrich the Self-Attention in Transformers!

Choi Jeongwhan ; Wi Hyowon ; Kim Jayoung ; Shin Yehjin ; Lee Kookjin ; Trask Nathaniel ; Park Noseong

Graph Convolutions Enrich the Self-Attention in Transformers!

Abstract

Transformers, renowned for their self-attention mechanism, have achievedstate-of-the-art performance across various tasks in natural languageprocessing, computer vision, time-series modeling, etc. However, one of thechallenges with deep Transformer models is the oversmoothing problem, whererepresentations across layers converge to indistinguishable values, leading tosignificant performance degradation. We interpret the original self-attentionas a simple graph filter and redesign it from a graph signal processing (GSP)perspective. We propose a graph-filter-based self-attention (GFSA) to learn ageneral yet effective one, whose complexity, however, is slightly larger thanthat of the original self-attention mechanism. We demonstrate that GFSAimproves the performance of Transformers in various fields, including computervision, natural language processing, graph-level tasks, speech recognition, andcode classification.

Code Repositories

jeongwhanchoi/gfsa
Official
jax
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
defect-detection-on-codexglue-devignPLBART + GFSA
Accuracy: 62.96
defect-detection-on-codexglue-devignCodeT5-small + GFSA
Accuracy: 63.69
defect-detection-on-codexglue-devignCodeT5-small
Accuracy: 63.25
defect-detection-on-codexglue-devignRoBERTa + GFSA
Accuracy: 64.39
defect-detection-on-codexglue-devignPLBART
Accuracy: 62.63
defect-detection-on-codexglue-devignCodeT5-base
Accuracy: 63.51
defect-detection-on-codexglue-devignCodeBERT + GFSA
Accuracy: 64.49
defect-detection-on-codexglue-devignRoBERTa
Accuracy: 62.88
defect-detection-on-codexglue-devignCodeBERT
Accuracy: 64.31
defect-detection-on-codexglue-devignCodeT5-base + GFSA
Accuracy: 64.75
graph-regression-on-pcqm4m-lscGraphormer + GFSA
Validation MAE: 0.1193
graph-regression-on-pcqm4mv2-lscGraphormer + GFSA
Validation MAE: 0.0860
image-classification-on-imagenetCaiT-S + GFSA
Top 1 Accuracy: 82.8%
image-classification-on-imagenetSwin-S + GFSA
Top 1 Accuracy: 83%
image-classification-on-imagenetDeiT-S-12 + GFSA
Top 1 Accuracy: 81.1%
image-classification-on-imagenetDeiT-S-24 + GFSA
Top 1 Accuracy: 81.5%
speech-recognition-on-librispeech-100h-testBranchformer + GFSA
Word Error Rate (WER): 9.6
speech-recognition-on-librispeech-100h-test-1Branchformer + GFSA
Word Error Rate (WER): 22.25
speech-recognition-on-librispeech-test-cleanBranchformer + GFSA
Word Error Rate (WER): 2.11
speech-recognition-on-librispeech-test-otherBranchformer + GFSA
Word Error Rate (WER): 4.94

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Graph Convolutions Enrich the Self-Attention in Transformers! | Papers | HyperAI