3 months ago

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

Li Yan Zhang Tianyi Li Zechuan Han Soyeon Caren

Abstract

Transformer-based Large Language Models (LLMs) struggle with inputs exceedingtheir training context window due to positional out-of-distribution (O.O.D.)issues that disrupt attention. Existing solutions, including fine-tuning andtraining-free methods, face challenges like inefficiency, redundantinterpolation, logit outliers, or loss of local positional information. Wepropose Greedy Attention Logit Interpolation (GALI), a training-free methodthat improves length extrapolation by greedily reusing pretrained positionalintervals and interpolating attention logit to eliminate outliers. GALIachieves stable and superior performance across a wide range of long-contexttasks without requiring input-length-specific tuning. Our analysis furtherreveals that LLMs interpret positional intervals unevenly and that restrictinginterpolation to narrower ranges improves performance, even on short-contexttasks. GALI represents a step toward more robust and generalizable long-textprocessing in LLMs. Our implementation of GALI, along with the experiments fromour paper, is open-sourced at https://github.com/adlnlp/Gali.

Code Repositories

academycityl/gali

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
long-context-understanding-on-l-eval	GALI(Llama3-8b-ins-8k-to-32k)	Average Score: 42.79
long-context-understanding-on-l-eval	GALI(Llama3-8b-ins-4k-to-16k)	Average Score: 59.21
long-context-understanding-on-l-eval	GALI(Llama3-8b-ins-8k-to-16k)	Average Score: 42.32
long-context-understanding-on-l-eval	GALI(Llama3-8b-ins-4k-to-32k)	Average Score: 59.10
long-context-understanding-on-longbench	GALI(Llama3-8b-ins-8k-to-16k)	Average Score: 45.17
long-context-understanding-on-longbench	GALI(Llama3-8b-ins-4k-to-16k)	Average Score: 46.22
long-context-understanding-on-longbench	GALI(Llama3-8b-ins-8k-to-32k)	Average Score: 45.38

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette