HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Test-Time Training on Nearest Neighbors for Large Language Models

Moritz Hardt Yu Sun

Test-Time Training on Nearest Neighbors for Large Language Models

Abstract

Many recent efforts augment language models with retrieval, by adding retrieved data to the input context. For this approach to succeed, the retrieved data must be added at both training and test time. Moreover, as input length grows linearly with the size of retrieved data, cost in computation and memory grows quadratically for modern Transformers. To avoid these complications, we simply fine-tune the model on retrieved data at test time, using its standard training setup. We build a large-scale distributed index based on text embeddings of the Pile dataset. For each test input, our system retrieves its neighbors and fine-tunes the model on their text. Surprisingly, retrieving and training on as few as 20 neighbors, each for only one gradient iteration, drastically improves performance across more than 20 language modeling tasks in the Pile. For example, test-time training with nearest neighbors significantly narrows the performance gap between a small GPT-2 and a GPT-Neo model more than 10 times larger. Sufficient index quality and size, however, are necessary. Our work establishes a first baseline of test-time training for language modeling.

Code Repositories

socialfoundations/tttlm
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-the-pileGPT-2 Large 774M (test-time training on nearest neighbors)
Bits per byte: 0.85

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Test-Time Training on Nearest Neighbors for Large Language Models | Papers | HyperAI