Command Palette
Search for a command to run...
TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training
Yifan Wang Binbin Liu Fengze Liu Yuanfan Guo Jiyao Deng Xuecheng Wu Weidong Zhou Xiaohuan Zhou Taifeng Wang

Abstract
The data mixture used in the pre-training of a language model is acornerstone of its final performance. However, a static mixing strategy issuboptimal, as the model's learning preferences for various data domains shiftdynamically throughout training. Crucially, observing these evolvingpreferences in a computationally efficient manner remains a significantchallenge. To address this, we propose TiKMiX, a method that dynamicallyadjusts the data mixture according to the model's evolving preferences. TiKMiXintroduces Group Influence, an efficient metric for evaluating the impact ofdata domains on the model. This metric enables the formulation of the datamixing problem as a search for an optimal, influence-maximizing distribution.We solve this via two approaches: TiKMiX-D for direct optimization, andTiKMiX-M, which uses a regression model to predict a superior mixture. Wetrained models with different numbers of parameters, on up to 1 trilliontokens. TiKMiX-D exceeds the performance of state-of-the-art methods likeREGMIX while using just 20% of the computational resources. TiKMiX-M leads toan average performance gain of 2% across 9 downstream benchmarks. Ourexperiments reveal that a model's data preferences evolve with trainingprogress and scale, and we demonstrate that dynamically adjusting the datamixture based on Group Influence, a direct measure of these preferences,significantly improves performance by mitigating the underdigestion of dataseen with static ratios.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.