HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Phantom of Latent for Large Language and Vision Models

Byung-Kwan Lee Sangyun Chung Chae Won Kim Beomchan Park Yong Man Ro

Phantom of Latent for Large Language and Vision Models

Abstract

The success of visual instruction tuning has accelerated the development oflarge language and vision models (LLVMs). Following the scaling laws ofinstruction-tuned large language models (LLMs), LLVMs either have furtherincreased their sizes, reaching 26B, 34B, and even 80B parameters. While thisincrease in model size has yielded significant performance gains, it demandssubstantially more hardware resources for both training and inference.Consequently, there naturally exists a strong need for efficient LLVMs thatachieve the performance of larger models while being smaller in size. Toachieve this need, we present a new efficient LLVM family with model sizes of0.5B, 1.8B, 3.8B, and 7B parameters, Phantom, which significantly enhanceslearning capabilities within limited structures. By temporarily increasing thelatent hidden dimension during multi-head self-attention (MHSA), we make LLVMsprepare to look and understand much more vision-language knowledge on thelatent, without substantially increasing physical model sizes. To maximize itsadvantage, we introduce Phantom Optimization (PO) using both autoregressivesupervised fine-tuning (SFT) and direct preference optimization (DPO)-likeconcept, which effectively follows correct answers while eliminating incorrectand ambiguous ones. Phantom outperforms numerous larger open- and closed-sourceLLVMs, positioning itself as a leading solution in the landscape of efficientLLVMs.

Code Repositories

byungkwanlee/phantom
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-mm-vetPhantom-7B
GPT-4 score: 70.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Phantom of Latent for Large Language and Vision Models | Papers | HyperAI