Dengchun LiYingzi MaNaizheng WangZhengmao YeZhiyuan ChengYinghao TangYan ZhangLei DuanJie ZuoCal YangMingjie Tang

摘要
微调大型语言模型(LLM)是将预训练模型适配至特定应用场景的常用方法。尽管LoRA等方法在微调过程中有效缓解了GPU显存瓶颈,但其性能在多任务场景下往往表现不足。相比之下,混合专家模型(Mixture-of-Experts, MoE)如Mixtral 8x7B,在多任务学习中展现出卓越的性能,同时保持较低的参数量。然而,这类MoE模型的资源消耗依然较高,尤其对显存低于24GB的消费级GPU而言仍具挑战性。为应对上述挑战,我们提出MixLoRA,一种基于LoRA构建资源高效稀疏MoE模型的新方法。MixLoRA在冻结的预训练稠密模型的前馈网络(feed-forward network)模块中插入多个基于LoRA的专家,并采用常见的top-k路由机制。与现有的基于LoRA的MoE方法不同,MixLoRA通过引入独立的注意力层LoRA适配器,显著提升了模型性能。此外,我们设计了一种辅助负载均衡损失函数,以缓解路由机制中的专家负载不均问题。实验结果表明,在多任务学习场景下,MixLoRA相比当前最先进的参数高效微调(PEFT)方法,准确率提升了约9%。同时,我们提出了一种新型高吞吐量框架,有效缓解了MoE模型在训练与推理过程中的计算与内存瓶颈。该框架在训练与推理阶段均实现GPU显存占用降低40%,令牌计算延迟减少30%。
代码仓库
mikecovlee/mLoRA
官方
pytorch
GitHub 中提及
TUDB-Labs/MixLoRA
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| common-sense-reasoning-on-arc-challenge | LLaMA-2 7B + MixLoRA | Accuracy: 58.1 |
| common-sense-reasoning-on-arc-challenge | LLaMA-2 13B + MixLoRA | Accuracy: 69.9 |
| common-sense-reasoning-on-arc-challenge | LLaMA-3 8B + MixLoRA | Accuracy: 79.9 |
| common-sense-reasoning-on-arc-easy | LLaMA-2 13B + MixLoRA | Accuracy: 83.5 |
| common-sense-reasoning-on-arc-easy | LLaMA-2 7B + MixLoRA | Accuracy: 77.7 |
| common-sense-reasoning-on-arc-easy | LLaMA-3 8B + MixLoRA | Accuracy: 86.5 |
| common-sense-reasoning-on-winogrande | LLaMA-3 8B + MixLoRA | Accuracy: 82.1 |
| common-sense-reasoning-on-winogrande | LLaMA-2 13B + MixLoRA | Accuracy: 86.3 |
| common-sense-reasoning-on-winogrande | LLaMA-2 7B + MixLoRA | Accuracy: 76.8 |
| question-answering-on-boolq | LLaMA-2 7B + MixLoRA | Accuracy: 72.7 |
| question-answering-on-boolq | LLaMA-3 8B + MixLoRA | Accuracy: 75 |
| question-answering-on-boolq | LLaMA-2 13B + MixLoRA | Accuracy: 77.1 |
| question-answering-on-openbookqa | LLaMA-2 7B + MixLoRA | Accuracy: 81.6 |
| question-answering-on-openbookqa | LLaMA-3 8B + MixLoRA | Accuracy: 84.8 |
| question-answering-on-openbookqa | LLaMA-2 13B + MixLoRA | Accuracy: 83 |
| question-answering-on-piqa | LLaMA-3 8B + MixLoRA | Accuracy: 87.6 |
| question-answering-on-piqa | LLaMA-2 7B + MixLoRA | Accuracy: 83.2 |
| question-answering-on-piqa | LLaMA-2 13B + MixLoRA | Accuracy: 86.8 |
| question-answering-on-social-iqa | LLaMA-3 8B + MixLoRA | Accuracy: 78.8 |
| question-answering-on-social-iqa | LLaMA-2 13B + MixLoRA | Accuracy: 82.5 |
| question-answering-on-social-iqa | LLaMA-2 7B + MixLoRA | Accuracy: 78 |