Command Palette
Search for a command to run...
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Abstract
Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance in multi-task learning scenarios while maintaining a reduced parameter count. However, the resource requirements of these MoEs remain challenging, particularly for consumer-grade GPUs with less than 24GB memory. To tackle these challenges, we propose MixLoRA, an approach to construct a resource-efficient sparse MoE model based on LoRA. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model and employs a commonly used top-k router. Unlike other LoRA-based MoE methods, MixLoRA enhances model performance by utilizing independent attention-layer LoRA adapters. Additionally, an auxiliary load balance loss is employed to address the imbalance problem of the router. Our evaluations show that MixLoRA improves about 9% accuracy compared to state-of-the-art PEFT methods in multi-task learning scenarios. We also propose a new high-throughput framework to alleviate the computation and memory bottlenecks during the training and inference of MOE models. This framework reduces GPU memory consumption by 40% and token computation latency by 30% during both training and inference.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| common-sense-reasoning-on-arc-challenge | LLaMA-2 7B + MixLoRA | Accuracy: 58.1 |
| common-sense-reasoning-on-arc-challenge | LLaMA-2 13B + MixLoRA | Accuracy: 69.9 |
| common-sense-reasoning-on-arc-challenge | LLaMA-3 8B + MixLoRA | Accuracy: 79.9 |
| common-sense-reasoning-on-arc-easy | LLaMA-2 13B + MixLoRA | Accuracy: 83.5 |
| common-sense-reasoning-on-arc-easy | LLaMA-2 7B + MixLoRA | Accuracy: 77.7 |
| common-sense-reasoning-on-arc-easy | LLaMA-3 8B + MixLoRA | Accuracy: 86.5 |
| common-sense-reasoning-on-winogrande | LLaMA-3 8B + MixLoRA | Accuracy: 82.1 |
| common-sense-reasoning-on-winogrande | LLaMA-2 13B + MixLoRA | Accuracy: 86.3 |
| common-sense-reasoning-on-winogrande | LLaMA-2 7B + MixLoRA | Accuracy: 76.8 |
| question-answering-on-boolq | LLaMA-2 7B + MixLoRA | Accuracy: 72.7 |
| question-answering-on-boolq | LLaMA-3 8B + MixLoRA | Accuracy: 75 |
| question-answering-on-boolq | LLaMA-2 13B + MixLoRA | Accuracy: 77.1 |
| question-answering-on-openbookqa | LLaMA-2 7B + MixLoRA | Accuracy: 81.6 |
| question-answering-on-openbookqa | LLaMA-3 8B + MixLoRA | Accuracy: 84.8 |
| question-answering-on-openbookqa | LLaMA-2 13B + MixLoRA | Accuracy: 83 |
| question-answering-on-piqa | LLaMA-3 8B + MixLoRA | Accuracy: 87.6 |
| question-answering-on-piqa | LLaMA-2 7B + MixLoRA | Accuracy: 83.2 |
| question-answering-on-piqa | LLaMA-2 13B + MixLoRA | Accuracy: 86.8 |
| question-answering-on-social-iqa | LLaMA-3 8B + MixLoRA | Accuracy: 78.8 |
| question-answering-on-social-iqa | LLaMA-2 13B + MixLoRA | Accuracy: 82.5 |
| question-answering-on-social-iqa | LLaMA-2 7B + MixLoRA | Accuracy: 78 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.