2 months ago

SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM Evaluation

Gayathri Saranathan Cong Xu Mahammad Parwez Alam Tarun Kumar Martin Foltin et al

Abstract

The rapid expansion of Large Language Models (LLMs) and natural language processing datasets has made exhaustive benchmark evaluations computationally prohibitive. Inspired by high-stakes competitions like the International Mathematical Olympiad-where a few well-chosen problems suffice to differentiate top performers—we present SubLIME, which reduces evaluation costs by 80% to 99% while preserving ranking fidelity. It trains a Rank Correlation Prediction (RCP) model that combines limited performance data from only 5-20 anchor LLMs with dataset intrinsic metrics - Difficulty, Quality, and Distributional Dispersion-to predict how closely a candidate subset reflects full-benchmark rankings. Guided by these predictions, SubLIME selects a “winning” subset (1-20% of full set data) for evaluating new LLMs, preserving global rankings significant better than other data-efficient methods across ten diverse benchmarks.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

SubLIME: Subset Selection via Rank Correlation Prediction for Data-Efficient LLM Evaluation

Gayathri Saranathan Cong Xu Mahammad Parwez Alam Tarun Kumar Martin Foltin et al

Abstract

Build AI with AI

Hyper Newsletters