HyperAIHyperAI

Command Palette

Search for a command to run...

Native Sparse Attention

Date

3 months ago

Native Sparse Attention (NSA) is a native trainable sparse attention mechanism proposed by DeepSeek, Peking University, and the University of Washington on February 27, 2025. It aims to solve the computational bottleneck problem in long sequence modeling. This method combines algorithmic innovation with hardware optimization to achieve efficient long-context modeling.Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention", which won the ACL 25 Best Paper Award.

Pre-trained on a 27B-parameter Transformer backbone model, NSA achieves comparable or better performance than fully connected attention models on common benchmarks, long-context tasks, and inference tasks. When processing 64k-length sequences, NSA achieves significant speedups in decoding, forward propagation, and backpropagation.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Native Sparse Attention | Wiki | HyperAI