3 个月前

测试时高效学习：LLM的主动微调

Jonas Hübotter Sascha Bongni Ido Hakimi Andreas Krause

摘要

近期在微调语言模型方面的努力通常依赖于自动数据选择方法，普遍采用从大规模数据集中进行最近邻（Nearest Neighbors）检索的方式。然而，我们从理论上证明，此类方法倾向于选择冗余数据，从而限制了其有效性，甚至可能损害模型性能。为解决这一问题，我们提出 SIFT（Selective Information-Focused Training），一种旨在降低给定提示下模型响应不确定性的数据选择算法。该方法融合了信息检索与主动学习的核心思想。与传统最近邻检索在存在信息重复时表现不佳不同，SIFT 能够有效识别并规避信息冗余，从而优化所选样本的总体信息增益。我们在 Pile 数据集上针对提示特定的语言建模任务，评估了测试时微调（test-time fine-tuning）的效果，结果表明，SIFT 在保持极低计算开销的前提下，始终优于传统的最近邻检索方法。此外，我们证明了所提出的不确定性估计能够有效预测测试时微调带来的性能提升，并基于此设计了一种自适应算法，使测试时计算资源的投入与实际获得的性能增益成比例。我们开源了 $\texttt{activeft}$（主动微调）库，可作为最近邻检索的即插即用替代方案，便于在实际应用中快速部署与验证。

代码仓库

jonhue/activeft

官方

pytorch

GitHub 中提及

基准测试

基准	方法	指标
language-modelling-on-the-pile	Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)	Bits per byte: 0.557
language-modelling-on-the-pile	Llama-3.2 3B	Bits per byte: 0.640
language-modelling-on-the-pile	Phi-3 3.8B	Bits per byte: 0.679
language-modelling-on-the-pile	Gemma-2 9B	Bits per byte: 0.670
language-modelling-on-the-pile	Gemma-2 27B	Bits per byte: 0.629
language-modelling-on-the-pile	Gemma-2 2B	Bits per byte: 0.721
language-modelling-on-the-pile	Test-Time Fine-Tuning with SIFT + GPT-2 (124M)	Bits per byte: 0.862
language-modelling-on-the-pile	Phi-3 7B	Bits per byte: 0.678
language-modelling-on-the-pile	Test-Time Fine-Tuning with SIFT + GPT-2 (774M)	Bits per byte: 0.762
language-modelling-on-the-pile	Test-Time Fine-Tuning with SIFT + Phi-3 (3.8B)	Bits per byte: 0.595
language-modelling-on-the-pile	Llama-3.2 1B	Bits per byte: 0.697
language-modelling-on-the-pile	Phi-3 14B	Bits per byte: 0.651
language-modelling-on-the-pile	Llama-3.2-Instruct 1B	Bits per byte: 0.807
language-modelling-on-the-pile	Llama-3.2-Instruct 3B	Bits per byte: 0.737
language-modelling-on-the-pile	Test-Time Fine-Tuning with SIFT + Llama-3.2 (1B)	Bits per byte: 0.606

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供