HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

Yang Zhengyuan ; Li Linjie ; Lin Kevin ; Wang Jianfeng ; Lin Chung-Ching ; Liu Zicheng ; Wang Lijuan

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

Abstract

Large multimodal models (LMMs) extend large language models (LLMs) withmulti-sensory skills, such as visual understanding, to achieve stronger genericintelligence. In this paper, we analyze the latest model, GPT-4V(ision), todeepen the understanding of LMMs. The analysis focuses on the intriguing tasksthat GPT-4V can perform, containing test samples to probe the quality andgenericity of GPT-4V's capabilities, its supported inputs and working modes,and the effective ways to prompt the model. In our approach to exploringGPT-4V, we curate and organize a collection of carefully designed qualitativesamples spanning a variety of domains and tasks. Observations from thesesamples demonstrate that GPT-4V's unprecedented ability in processingarbitrarily interleaved multimodal inputs and the genericity of itscapabilities together make GPT-4V a powerful multimodal generalist system.Furthermore, GPT-4V's unique capability of understanding visual markers drawnon input images can give rise to new human-computer interaction methods such asvisual referring prompting. We conclude the report with in-depth discussions onthe emerging application scenarios and the future research directions forGPT-4V-based systems. We hope that this preliminary exploration will inspirefuture research on the next-generation multimodal task formulation, new ways toexploit and enhance LMMs to solve real-world problems, and gaining betterunderstanding of multimodal foundation models. Finally, we acknowledge that themodel under our study is solely the product of OpenAI's innovative work, andthey should be fully credited for its development. Please see the GPT-4Vcontributions paper for the authorship and credit attribution:https://cdn.openai.com/contributions/gpt-4v.pdf

Code Repositories

vista-h/gpt-4v_social_media
Mentioned in GitHub
qi-zhangyang/gemini-vs-gpt4v
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
mmr-total-on-mrr-benchmarkGPT-4V
Total Column Score: 415

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) | Papers | HyperAI