Command Palette
Search for a command to run...
Yang Zhengyuan ; Li Linjie ; Lin Kevin ; Wang Jianfeng ; Lin Chung-Ching ; Liu Zicheng ; Wang Lijuan

Abstract
Large multimodal models (LMMs) extend large language models (LLMs) withmulti-sensory skills, such as visual understanding, to achieve stronger genericintelligence. In this paper, we analyze the latest model, GPT-4V(ision), todeepen the understanding of LMMs. The analysis focuses on the intriguing tasksthat GPT-4V can perform, containing test samples to probe the quality andgenericity of GPT-4V's capabilities, its supported inputs and working modes,and the effective ways to prompt the model. In our approach to exploringGPT-4V, we curate and organize a collection of carefully designed qualitativesamples spanning a variety of domains and tasks. Observations from thesesamples demonstrate that GPT-4V's unprecedented ability in processingarbitrarily interleaved multimodal inputs and the genericity of itscapabilities together make GPT-4V a powerful multimodal generalist system.Furthermore, GPT-4V's unique capability of understanding visual markers drawnon input images can give rise to new human-computer interaction methods such asvisual referring prompting. We conclude the report with in-depth discussions onthe emerging application scenarios and the future research directions forGPT-4V-based systems. We hope that this preliminary exploration will inspirefuture research on the next-generation multimodal task formulation, new ways toexploit and enhance LMMs to solve real-world problems, and gaining betterunderstanding of multimodal foundation models. Finally, we acknowledge that themodel under our study is solely the product of OpenAI's innovative work, andthey should be fully credited for its development. Please see the GPT-4Vcontributions paper for the authorship and credit attribution:https://cdn.openai.com/contributions/gpt-4v.pdf
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| mmr-total-on-mrr-benchmark | GPT-4V | Total Column Score: 415 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.