HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

Li Yan Zhang Tianyi Li Zechuan Han Soyeon Caren

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention
  Logit Interpolation (GALI)

Abstract

Transformer-based Large Language Models (LLMs) struggle with inputs exceedingtheir training context window due to positional out-of-distribution (O.O.D.)issues that disrupt attention. Existing solutions, including fine-tuning andtraining-free methods, face challenges like inefficiency, redundantinterpolation, logit outliers, or loss of local positional information. Wepropose Greedy Attention Logit Interpolation (GALI), a training-free methodthat improves length extrapolation by greedily reusing pretrained positionalintervals and interpolating attention logit to eliminate outliers. GALIachieves stable and superior performance across a wide range of long-contexttasks without requiring input-length-specific tuning. Our analysis furtherreveals that LLMs interpret positional intervals unevenly and that restrictinginterpolation to narrower ranges improves performance, even on short-contexttasks. GALI represents a step toward more robust and generalizable long-textprocessing in LLMs. Our implementation of GALI, along with the experiments fromour paper, is open-sourced at https://github.com/adlnlp/Gali.

Code Repositories

academycityl/gali
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
long-context-understanding-on-l-evalGALI(Llama3-8b-ins-8k-to-32k)
Average Score: 42.79
long-context-understanding-on-l-evalGALI(Llama3-8b-ins-4k-to-16k)
Average Score: 59.21
long-context-understanding-on-l-evalGALI(Llama3-8b-ins-8k-to-16k)
Average Score: 42.32
long-context-understanding-on-l-evalGALI(Llama3-8b-ins-4k-to-32k)
Average Score: 59.10
long-context-understanding-on-longbenchGALI(Llama3-8b-ins-8k-to-16k)
Average Score: 45.17
long-context-understanding-on-longbenchGALI(Llama3-8b-ins-4k-to-16k)
Average Score: 46.22
long-context-understanding-on-longbenchGALI(Llama3-8b-ins-8k-to-32k)
Average Score: 45.38

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI) | Papers | HyperAI