3 months ago

Preventing Local Pitfalls in Vector Quantization via Optimal Transport

Borui Zhang Wenzhao Zheng Jie Zhou Jiwen Lu

Abstract

Vector-quantized networks (VQNs) have exhibited remarkable performance across various tasks, yet they are prone to training instability, which complicates the training process due to the necessity for techniques such as subtle initialization and model distillation. In this study, we identify the local minima issue as the primary cause of this instability. To address this, we integrate an optimal transport method in place of the nearest neighbor search to achieve a more globally informed assignment. We introduce OptVQ, a novel vector quantization method that employs the Sinkhorn algorithm to optimize the optimal transport problem, thereby enhancing the stability and efficiency of the training process. To mitigate the influence of diverse data distributions on the Sinkhorn algorithm, we implement a straightforward yet effective normalization strategy. Our comprehensive experiments on image reconstruction tasks demonstrate that OptVQ achieves 100% codebook utilization and surpasses current state-of-the-art VQNs in reconstruction quality.

Code Repositories

zbr17/OptVQ

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
image-reconstruction-on-imagenet	OptVQ (16x16x8)	FID: 0.91 LPIPS: 0.066 PSNR: 27.57 SSIM: 0.729
image-reconstruction-on-imagenet	OptVQ (16x16x4)	FID: 1.00 LPIPS: 0.076 PSNR: 26.59 SSIM: 0.717

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette