
摘要
模型量化与压缩是广泛应用于推理阶段以降低计算资源消耗的技术。尽管当前最先进的方法在较高比特位(如4位或8位)下已实现较为理想的精度,但进一步将模型量化或压缩至更低比特位(例如1位或2位)仍面临显著挑战。为应对这一难题,本文聚焦于预训练模型权重中的异常值(outliers),这些异常值会破坏低比特量化与压缩的有效性。为此,本文提出一种名为范围限制损失(Range Restriction Loss, R2-Loss)的新方法,通过在预训练阶段消除权重中的异常值,构建更适用于低比特量化与压缩的模型。通过有效限制权重的取值范围,R2-Loss能够使权重整体分布趋于紧凑,从而提升量化过程中的比特分辨率,使量化与压缩技术能够更充分地利用其有限的数值表示能力。本文设计了三种不同形式的R2-Loss:L∞范数R2-Loss、其扩展形式——Margin R2-Loss,以及一种新型的Soft-Min-Max R2-Loss,均可作为辅助损失函数,在全精度模型训练过程中使用。其中,L∞范数与Margin R2-Loss在对称量化场景中表现优异;而Soft-Min-Max R2-Loss则在模型压缩任务中展现出更优性能。实验结果表明,R2-Loss显著提升了低比特量化与压缩的精度,适用于当前最先进的后训练量化(Post-Training Quantization, PTQ)、量化感知训练(Quantization-Aware Training, QAT)以及模型压缩技术。具体而言,在应用R2-Loss后,MobileNet-V2实现2位权重与8位激活的PTQ精度由50.66%提升至59.49%;MobileNet-V1实现2位权重与激活的QAT精度由55.96%提升至59.05%;ResNet18实现1位权重压缩的精度由45.54%提升至52.58%。上述结果充分验证了R2-Loss在推动低比特模型高效量化与压缩方面的有效性与普适性。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| model-compression-on-imagenet | MobileNet-v1 + 2bit-2dim model compression using DKM | Top-1: 53.99 |
| model-compression-on-imagenet | ResNet-18 + 4bit-1dim model compression using DKM | Top-1: 70.52 |
| model-compression-on-imagenet | ResNet-18 + 2bit-1dim model compression using DKM | Top-1: 68.63 |
| model-compression-on-imagenet | MobileNet-v1 + 2bit-1dim model compression using DKM | Top-1: 67.62 |
| model-compression-on-imagenet | MobileNet-v1 + 1bit-1dim model compression using DKM | Top-1: 52.58 |
| model-compression-on-imagenet | ResNet-18 + 4bit-4dim model compression using DKM | Top-1: 66.1 |
| model-compression-on-imagenet | MobileNet-v1 + 4bit-4dim model compression using DKM | Top-1: 61.4 |
| model-compression-on-imagenet | ResNet-18 + 2bit-2dim model compression using DKM | Top-1: 64.7 |
| model-compression-on-imagenet | ResNet-18 + 1bit-1dim model compression using DKM | Top-1: 59.7 |
| model-compression-on-imagenet | MobileNet-v1 + 4bit-1dim model compression using DKM | Top-1: 69.63 |
| model-compression-on-qnli | MobileBERT + 2bit-1dim model compression using DKM | Accuracy: 82.13 |
| model-compression-on-qnli | MobileBERT + 1bit-1dim model compression using DKM | Accuracy: 63.17 |
| quantization-on-imagenet | ResNet-18 + PACT + R2Loss | Activation bits: 4 Top-1 Accuracy (%): 68.45 Weight bits: 2 |
| quantization-on-imagenet | MobileNet-v1 + EWGS + R2Loss | Top-1 Accuracy (%): 69.79 Weight bits: 4 |
| quantization-on-imagenet | MobileNet-v1 + LSQ + R2Loss | Top-1 Accuracy (%): 69.64 |