HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Shuohang Wang Luowei Zhou Zhe Gan Yen-Chun Chen Yuwei Fang Siqi Sun Yu Cheng Jingjing Liu

Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

Abstract

Transformer has become ubiquitous in the deep learning field. One of the key ingredients that destined its success is the self-attention mechanism, which allows fully-connected contextual encoding over input tokens. However, despite its effectiveness in modeling short sequences, self-attention suffers when handling inputs with extreme long-range dependencies, as its complexity grows quadratically with respect to the sequence length. Therefore, long sequences are often encoded by Transformer in chunks using a sliding window. In this paper, we propose Cluster-Former, a novel clustering-based sparse Transformer to perform attention across chunked sequences. The proposed framework is pivoted on two unique types of Transformer layer: Sliding-Window Layer and Cluster-Former Layer, which encode local sequence information and global context jointly and iteratively. This new design allows information integration beyond local windows, which is especially beneficial for question answering (QA) tasks that rely on long-range dependencies. Experiments show that Cluster-Former achieves state-of-the-art performance on several major QA benchmarks.

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-enwiki8Cluster-Former (#C=512)
Bit per Character (BPC): 1.22
open-domain-question-answering-on-searchqaCluster-Former (#C=512)
EM: 68.0
question-answering-on-natural-questions-longCluster-Former (#C=512)
F1: 76.5
question-answering-on-quasart-tCluster-Former (#C=512)
EM: 54

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding | Papers | HyperAI