5 months ago

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Byung-Kwan Lee Ryo Hachiuma Yong Man Ro Yu-Chiang Frank Wang Yueh-Hua Wu

Abstract

Recent advancements in vision-language models (VLMs) have leveraged largelanguage models (LLMs) to achieve performance on par with closed-source systemslike GPT-4V. However, deploying these models in real-world scenarios,particularly on resource-constrained devices, remains challenging due to theirsubstantial computational demands. This has spurred interest in distillingknowledge from large VLMs into smaller, more efficient counterparts. A keychallenge arises here from the diversity of VLM architectures, which are builton different LLMs and employ varying token types-differing in vocabulary size,token splits, and token index ordering. To address this challenge of limitationto a specific VLM type, we present Generation after Recalibration (GenRecal), anovel, general-purpose distillation framework for VLMs. GenRecal incorporates aRecalibrator that aligns and adapts feature representations betweenheterogeneous VLMs, enabling effective knowledge transfer across differenttypes of VLMs. Through extensive experiments on multiple challengingbenchmarks, we demonstrate that GenRecal significantly improves baselineperformances, eventually outperforming large-scale open- and closed-sourceVLMs.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Byung-Kwan Lee Ryo Hachiuma Yong Man Ro Yu-Chiang Frank Wang Yueh-Hua Wu

Abstract

Build AI with AI

Hyper Newsletters