6 months ago

Zhijian Liu Ligeng Zhu Baifeng Shi Zhuoyang Zhang Yuming Lou Shang Yang Haocheng Xi Shiyi Cao Yuxian Gu Dacheng Li

Abstract

Visual language models (VLMs) have made significant advances in accuracy inrecent years. However, their efficiency has received much less attention. Thispaper introduces NVILA, a family of open VLMs designed to optimize bothefficiency and accuracy. Building on top of VILA, we improve its modelarchitecture by first scaling up the spatial and temporal resolutions, and thencompressing visual tokens. This "scale-then-compress" approach enables NVILA toefficiently process high-resolution images and long videos. We also conduct asystematic investigation to enhance the efficiency of NVILA throughout itsentire lifecycle, from training and fine-tuning to deployment. NVILA matches orsurpasses the accuracy of many leading open and proprietary VLMs across a widerange of image and video benchmarks. At the same time, it reduces trainingcosts by 4.5X, fine-tuning memory usage by 3.4X, pre-filling latency by1.6-2.2X, and decoding latency by 1.2-2.8X. We will soon make our code andmodels available to facilitate reproducibility.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Zhijian Liu Ligeng Zhu Baifeng Shi Zhuoyang Zhang Yuming Lou Shang Yang Haocheng Xi Shiyi Cao Yuxian Gu Dacheng Li

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Zhijian Liu Ligeng Zhu Baifeng Shi Zhuoyang Zhang Yuming Lou Shang Yang Haocheng Xi Shiyi Cao Yuxian Gu Dacheng Li

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

NVILA: Efficient Frontier Visual Language Models

Zhijian Liu Ligeng Zhu Baifeng Shi Zhuoyang Zhang Yuming Lou Shang Yang Haocheng Xi Shiyi Cao Yuxian Gu Dacheng Li17 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

NVILA: Efficient Frontier Visual Language Models

Zhijian Liu Ligeng Zhu Baifeng Shi Zhuoyang Zhang Yuming Lou Shang Yang Haocheng Xi Shiyi Cao Yuxian Gu Dacheng Li17 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

NVILA: Efficient Frontier Visual Language Models

Zhijian Liu Ligeng Zhu Baifeng Shi Zhuoyang Zhang Yuming Lou Shang Yang Haocheng Xi Shiyi Cao Yuxian Gu Dacheng Li17 more

Abstract

Build AI with AI

HyperAI Newsletters

Zhijian Liu Ligeng Zhu Baifeng Shi Zhuoyang Zhang Yuming Lou Shang Yang Haocheng Xi Shiyi Cao Yuxian Gu Dacheng Li

Zhijian Liu Ligeng Zhu Baifeng Shi Zhuoyang Zhang Yuming Lou Shang Yang Haocheng Xi Shiyi Cao Yuxian Gu Dacheng Li

Zhijian Liu Ligeng Zhu Baifeng Shi Zhuoyang Zhang Yuming Lou Shang Yang Haocheng Xi Shiyi Cao Yuxian Gu Dacheng Li