2 months ago

TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

Yifan Wang Binbin Liu Fengze Liu Yuanfan Guo Jiyao Deng Xuecheng Wu Weidong Zhou Xiaohuan Zhou Taifeng Wang

Abstract

The data mixture used in the pre-training of a language model is acornerstone of its final performance. However, a static mixing strategy issuboptimal, as the model's learning preferences for various data domains shiftdynamically throughout training. Crucially, observing these evolvingpreferences in a computationally efficient manner remains a significantchallenge. To address this, we propose TiKMiX, a method that dynamicallyadjusts the data mixture according to the model's evolving preferences. TiKMiXintroduces Group Influence, an efficient metric for evaluating the impact ofdata domains on the model. This metric enables the formulation of the datamixing problem as a search for an optimal, influence-maximizing distribution.We solve this via two approaches: TiKMiX-D for direct optimization, andTiKMiX-M, which uses a regression model to predict a superior mixture. Wetrained models with different numbers of parameters, on up to 1 trilliontokens. TiKMiX-D exceeds the performance of state-of-the-art methods likeREGMIX while using just 20% of the computational resources. TiKMiX-M leads toan average performance gain of 2% across 9 downstream benchmarks. Ourexperiments reveal that a model's data preferences evolve with trainingprogress and scale, and we demonstrate that dynamically adjusting the datamixture based on Group Influence, a direct measure of these preferences,significantly improves performance by mitigating the underdigestion of dataseen with static ratios.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

Yifan Wang Binbin Liu Fengze Liu Yuanfan Guo Jiyao Deng Xuecheng Wu Weidong Zhou Xiaohuan Zhou Taifeng Wang

Abstract

Build AI with AI

Hyper Newsletters