HyperAIHyperAI

Command Palette

Search for a command to run...

Multi-dimensional pre-training Data Screening Framework Meta-rater

Date

2 months ago

A Multi-dimensional Data Selection Method for Pre-training Language Models (Meta-rater) was proposed by Shanghai Artificial Intelligence Laboratory and East China Normal University on June 4, 2025. It aims to integrate the four dimensions of professionalism, readability, reasoning, and cleanliness with existing quality indicators by learning optimal weights.Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models", which won the ACL 25 Best Theme Paper Award.

Meta-rater uses a surrogate model to train a regression model and predict the validation set loss, thereby identifying the optimal quality score combination. Experimental results show that Meta-rater can triple the convergence speed of a 1.3 billion parameter model and improve downstream task performance by 3.23%. This advantage is scalable to a 7.2 billion parameter model.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multi-dimensional pre-training Data Screening Framework Meta-rater | Wiki | HyperAI