Command Palette
Search for a command to run...
Choi Jeongwhan ; Wi Hyowon ; Kim Jayoung ; Shin Yehjin ; Lee Kookjin ; Trask Nathaniel ; Park Noseong

Abstract
Transformers, renowned for their self-attention mechanism, have achievedstate-of-the-art performance across various tasks in natural languageprocessing, computer vision, time-series modeling, etc. However, one of thechallenges with deep Transformer models is the oversmoothing problem, whererepresentations across layers converge to indistinguishable values, leading tosignificant performance degradation. We interpret the original self-attentionas a simple graph filter and redesign it from a graph signal processing (GSP)perspective. We propose a graph-filter-based self-attention (GFSA) to learn ageneral yet effective one, whose complexity, however, is slightly larger thanthat of the original self-attention mechanism. We demonstrate that GFSAimproves the performance of Transformers in various fields, including computervision, natural language processing, graph-level tasks, speech recognition, andcode classification.
Code Repositories
Benchmarks
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.