HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Byung-Kwan Lee Chae Won Kim Beomchan Park Yong Man Ro

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision
  Models

Abstract

The rapid development of large language and vision models (LLVMs) has beendriven by advances in visual instruction tuning. Recently, open-source LLVMshave curated high-quality visual instruction tuning datasets and utilizedadditional vision encoders or multiple computer vision models in order tonarrow the performance gap with powerful closed-source LLVMs. Theseadvancements are attributed to multifaceted information required for diversecapabilities, including fundamental image understanding, real-world knowledgeabout common-sense and non-object concepts (e.g., charts, diagrams, symbols,signs, and math problems), and step-by-step procedures for solving complexquestions. Drawing from the multifaceted information, we present a newefficient LLVM, Mamba-based traversal of rationales (Meteor), which leveragesmultifaceted rationale to enhance understanding and answering capabilities. Toembed lengthy rationales containing abundant information, we employ the Mambaarchitecture, capable of processing sequential data with linear timecomplexity. We introduce a new concept of traversal of rationale thatfacilitates efficient embedding of rationale. Subsequently, the backbonemultimodal language model (MLM) is trained to generate answers with the aid ofrationale. Through these steps, Meteor achieves significant improvements invision language performances across multiple evaluation benchmarks requiringdiverse capabilities, without scaling up the model size or employing additionalvision encoders and computer vision models.

Code Repositories

byungkwanlee/meteor
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-mm-vetMeteor
GPT-4 score: 57.3
Params: 7B

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | Papers | HyperAI