5 months ago

Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Haoyuan Wu; Haisheng Zheng; Zhuolun He; Bei Yu

Abstract

Large language models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across general tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce parameter-efficient sparsity crafting (PESC), which crafts dense models into sparse models using the mixture-of-experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal parameter increase when guaranteeing the quality of approximation in function space compared to original sparse upcycling. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our best sparse model outperforms other sparse and dense models and exhibits superior general capabilities compared to GPT-3.5. Our code is available at https://github.com/wuhy68/Parameter-Efficient-MoE.

Code Repositories

wuhy68/parameter-efficient-moe

Official

pytorch

Mentioned in GitHub

ShayekhBinIslam/openrag

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
arithmetic-reasoning-on-gsm8k	Camelidae-8×34B (5-shot)	Accuracy: 78.3
arithmetic-reasoning-on-gsm8k	Qwen2idae-16x14B (5-shot)	Accuracy: 77.8
code-generation-on-mbpp	Camelidae-8×34B (4-shot)	Accuracy: 41.4
code-generation-on-mbpp	Qwen2idae-16x14B (4-shot)	Accuracy: 48.6
common-sense-reasoning-on-arc-challenge	Camelidae-8×34B	Accuracy: 65.2
common-sense-reasoning-on-arc-easy	Camelidae-8×34B	Accuracy: 86.2
common-sense-reasoning-on-winogrande	Camelidae-8×34B	Accuracy: 80.9
math-word-problem-solving-on-math	Qwen2idae-16x14B (4-shot)	Accuracy: 29.9
math-word-problem-solving-on-math	Camelidae-8×34B (4-shot)	Accuracy: 22.6
multi-task-language-understanding-on-mmlu	Camelidae-8×34B (5-shot)	Average (%): 75.6
question-answering-on-piqa	Camelidae-8×34B	Accuracy: 82.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette