12 days ago

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Ling Team Bin Han Caizhi Tang Chen Liang Donghao Zhang Fan Yuan Feng Zhu Jie Gao Jingyu Hu Longfei Li

Abstract

In this technical report, we present the Ring-linear model series,specifically including Ring-mini-linear-2.0 and Ring-flash-linear-2.0.Ring-mini-linear-2.0 comprises 16B parameters and 957M activations, whileRing-flash-linear-2.0 contains 104B parameters and 6.1B activations. Bothmodels adopt a hybrid architecture that effectively integrates linear attentionand softmax attention, significantly reducing I/O and computational overhead inlong-context inference scenarios. Compared to a 32 billion parameter densemodel, this series reduces inference cost to 1/10, and compared to the originalRing series, the cost is also reduced by over 50%. Furthermore, throughsystematic exploration of the ratio between different attention mechanisms inthe hybrid architecture, we have identified the currently optimal modelstructure. Additionally, by leveraging our self-developed high-performance FP8operator library-linghe, overall training efficiency has been improved by 50%.Benefiting from the high alignment between the training and inference engineoperators, the models can undergo long-term, stable, and highly efficientoptimization during the reinforcement learning phase, consistently maintainingSOTA performance across multiple challenging complex reasoning benchmarks.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Ling Team Bin Han Caizhi Tang Chen Liang Donghao Zhang Fan Yuan Feng Zhu Jie Gao Jingyu Hu Longfei Li18 more

Abstract

Build AI with AI

Hyper Newsletters

Ling Team Bin Han Caizhi Tang Chen Liang Donghao Zhang Fan Yuan Feng Zhu Jie Gao Jingyu Hu Longfei Li