HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Abstract

Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance in multi-task learning scenarios while maintaining a reduced parameter count. However, the resource requirements of these MoEs remain challenging, particularly for consumer-grade GPUs with less than 24GB memory. To tackle these challenges, we propose MixLoRA, an approach to construct a resource-efficient sparse MoE model based on LoRA. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model and employs a commonly used top-k router. Unlike other LoRA-based MoE methods, MixLoRA enhances model performance by utilizing independent attention-layer LoRA adapters. Additionally, an auxiliary load balance loss is employed to address the imbalance problem of the router. Our evaluations show that MixLoRA improves about 9% accuracy compared to state-of-the-art PEFT methods in multi-task learning scenarios. We also propose a new high-throughput framework to alleviate the computation and memory bottlenecks during the training and inference of MOE models. This framework reduces GPU memory consumption by 40% and token computation latency by 30% during both training and inference.

Code Repositories

mikecovlee/mLoRA
Official
pytorch
Mentioned in GitHub
TUDB-Labs/MixLoRA
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
common-sense-reasoning-on-arc-challengeLLaMA-2 7B + MixLoRA
Accuracy: 58.1
common-sense-reasoning-on-arc-challengeLLaMA-2 13B + MixLoRA
Accuracy: 69.9
common-sense-reasoning-on-arc-challengeLLaMA-3 8B + MixLoRA
Accuracy: 79.9
common-sense-reasoning-on-arc-easyLLaMA-2 13B + MixLoRA
Accuracy: 83.5
common-sense-reasoning-on-arc-easyLLaMA-2 7B + MixLoRA
Accuracy: 77.7
common-sense-reasoning-on-arc-easyLLaMA-3 8B + MixLoRA
Accuracy: 86.5
common-sense-reasoning-on-winograndeLLaMA-3 8B + MixLoRA
Accuracy: 82.1
common-sense-reasoning-on-winograndeLLaMA-2 13B + MixLoRA
Accuracy: 86.3
common-sense-reasoning-on-winograndeLLaMA-2 7B + MixLoRA
Accuracy: 76.8
question-answering-on-boolqLLaMA-2 7B + MixLoRA
Accuracy: 72.7
question-answering-on-boolqLLaMA-3 8B + MixLoRA
Accuracy: 75
question-answering-on-boolqLLaMA-2 13B + MixLoRA
Accuracy: 77.1
question-answering-on-openbookqaLLaMA-2 7B + MixLoRA
Accuracy: 81.6
question-answering-on-openbookqaLLaMA-3 8B + MixLoRA
Accuracy: 84.8
question-answering-on-openbookqaLLaMA-2 13B + MixLoRA
Accuracy: 83
question-answering-on-piqaLLaMA-3 8B + MixLoRA
Accuracy: 87.6
question-answering-on-piqaLLaMA-2 7B + MixLoRA
Accuracy: 83.2
question-answering-on-piqaLLaMA-2 13B + MixLoRA
Accuracy: 86.8
question-answering-on-social-iqaLLaMA-3 8B + MixLoRA
Accuracy: 78.8
question-answering-on-social-iqaLLaMA-2 13B + MixLoRA
Accuracy: 82.5
question-answering-on-social-iqaLLaMA-2 7B + MixLoRA
Accuracy: 78

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | Papers | HyperAI