7 months ago

Bingchen Zhao Despoina Magka Minqi Jiang Xian Li Roberta Raileanu Tatiana Shavrina Jean-Christophe Gagnon-Audet Kelvin Niu Shagun Sodhani Michael Shvartsman

Abstract

Rapid advancements in large language models (LLMs) have the potential toassist in scientific progress. A critical capability toward this endeavor isthe ability to reproduce existing work. To evaluate the ability of AI agents toreproduce results in an active research area, we introduce the Automated LLMSpeedrunning Benchmark, leveraging the research community contributions on theNanoGPT speedrun, a competition to train a GPT-2 model in the shortest time.Each of the 19 speedrun tasks provides the agent with the previous recordstraining script, optionally paired with one of three hint formats, ranging frompseudocode to paper-like descriptions of the new records improvements. Recordsexecute quickly by design and speedrun improvements encompass diversecode-level changes, ranging from high-level algorithmic advancements tohardware-aware optimizations. These features make the benchmark both accessibleand realistic for the frontier problem of improving LLM training. We find thatrecent reasoning LLMs combined with SoTA scaffolds struggle to reimplementalready-known innovations in our benchmark, even when given detailed hints. Ourbenchmark thus provides a simple, non-saturated measure of an LLMs ability toautomate scientific reproduction, a necessary (but not sufficient) skill for anautonomous research agent.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

7 months ago

Bingchen Zhao Despoina Magka Minqi Jiang Xian Li Roberta Raileanu Tatiana Shavrina Jean-Christophe Gagnon-Audet Kelvin Niu Shagun Sodhani Michael Shvartsman

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

7 months ago

Bingchen Zhao Despoina Magka Minqi Jiang Xian Li Roberta Raileanu Tatiana Shavrina Jean-Christophe Gagnon-Audet Kelvin Niu Shagun Sodhani Michael Shvartsman

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements

Bingchen Zhao Despoina Magka Minqi Jiang Xian Li Roberta Raileanu Tatiana Shavrina Jean-Christophe Gagnon-Audet Kelvin Niu Shagun Sodhani Michael Shvartsman13 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements

Bingchen Zhao Despoina Magka Minqi Jiang Xian Li Roberta Raileanu Tatiana Shavrina Jean-Christophe Gagnon-Audet Kelvin Niu Shagun Sodhani Michael Shvartsman13 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements

Bingchen Zhao Despoina Magka Minqi Jiang Xian Li Roberta Raileanu Tatiana Shavrina Jean-Christophe Gagnon-Audet Kelvin Niu Shagun Sodhani Michael Shvartsman13 more

Abstract

Build AI with AI

HyperAI Newsletters

Bingchen Zhao Despoina Magka Minqi Jiang Xian Li Roberta Raileanu Tatiana Shavrina Jean-Christophe Gagnon-Audet Kelvin Niu Shagun Sodhani Michael Shvartsman

Bingchen Zhao Despoina Magka Minqi Jiang Xian Li Roberta Raileanu Tatiana Shavrina Jean-Christophe Gagnon-Audet Kelvin Niu Shagun Sodhani Michael Shvartsman

Bingchen Zhao Despoina Magka Minqi Jiang Xian Li Roberta Raileanu Tatiana Shavrina Jean-Christophe Gagnon-Audet Kelvin Niu Shagun Sodhani Michael Shvartsman