3 months ago

FNet: Mixing Tokens with Fourier Transforms

James Lee-Thorp Joshua Ainslie Ilya Eckstein Santiago Ontanon

Abstract

We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that "mix" input tokens. These linear mixers, along with standard nonlinearities in feed-forward layers, prove competent at modeling semantic relationships in several text classification tasks. Most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder with a standard, unparameterized Fourier Transform achieves 92-97% of the accuracy of BERT counterparts on the GLUE benchmark, but trains 80% faster on GPUs and 70% faster on TPUs at standard 512 input lengths. At longer input lengths, our FNet model is significantly faster: when compared to the "efficient" Transformers on the Long Range Arena benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all sequence lengths on GPUs (and across relatively shorter lengths on TPUs). Finally, FNet has a light memory footprint and is particularly efficient at smaller model sizes; for a fixed speed and accuracy budget, small FNet models outperform Transformer counterparts.

Code Repositories

erksch/fnet-pytorch

jax

Mentioned in GitHub

2024-MindSpore-1/Code2/tree/main/model-1/fnet

mindspore

jaketae/fnet

pytorch

Mentioned in GitHub

HJHGJGHHG/Paddle-FNet

paddle

google-research/google-research

Official

amoramine/FNet_with_BART_classification

pytorch

Mentioned in GitHub

amoramine/FNet_classification

pytorch

Mentioned in GitHub

yangyucheng000/University/tree/main/model-2/fnet

mindspore

labmlai/annotated_deep_learning_paper_implementations

pytorch

vineet54/fnet-google-pytorch

pytorch

Mentioned in GitHub

abdelghanibelgaid/FNet-TensorFlow-PyTorch

facebookresearch/xformers

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
linguistic-acceptability-on-cola	FNet-Large	Accuracy: 78%
natural-language-inference-on-multinli	FNet-Large	Matched: 78 Mismatched: 76
natural-language-inference-on-multinli	BERT-Large	Matched: 88 Mismatched: 88
natural-language-inference-on-qnli	FNet-Large	Accuracy: 85%
natural-language-inference-on-rte	FNet-Large	Accuracy: 69%
paraphrase-identification-on-quora-question	FNet-Large	F1: 85
semantic-textual-similarity-on-mrpc	FNet-Large	Accuracy: 88%
semantic-textual-similarity-on-sts-benchmark	FNet-Large	Spearman Correlation: 0.84
sentiment-analysis-on-sst-2-binary	FNet-Large	Accuracy: 94

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

FNet: Mixing Tokens with Fourier Transforms

James Lee-Thorp Joshua Ainslie Ilya Eckstein Santiago Ontanon

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters