Command Palette
Search for a command to run...

Abstract
GPT-4o is an autoregressive omni model that accepts as input any combinationof text, audio, image, and video, and generates any combination of text, audio,and image outputs. It's trained end-to-end across text, vision, and audio,meaning all inputs and outputs are processed by the same neural network. GPT-4ocan respond to audio inputs in as little as 232 milliseconds, with an averageof 320 milliseconds, which is similar to human response time in conversation.It matches GPT-4 Turbo performance on text in English and code, withsignificant improvement on text in non-English languages, while also being muchfaster and 50% cheaper in the API. GPT-4o is especially better at vision andaudio understanding compared to existing models. In line with our commitment tobuilding AI safely and consistent with our voluntary commitments to the WhiteHouse, we are sharing the GPT-4o System Card, which includes our PreparednessFramework evaluations. In this System Card, we provide a detailed look atGPT-4o's capabilities, limitations, and safety evaluations across multiplecategories, focusing on speech-to-speech while also evaluating text and imagecapabilities, and measures we've implemented to ensure the model is safe andaligned. We also include third-party assessments on dangerous capabilities, aswell as discussion of potential societal impacts of GPT-4o's text and visioncapabilities.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| optical-character-recognition-on-ocrbench-v2-english | GPT-4o | Accuracy: 47.6 |
| spatial-reasoning-on-6-dof-spatialbench | GPT-4o | Orientation-abs: 25.8 Orientation-rel: 44.2 Position-abs: 28.4 Position-rel: 49.4 Total: 36.2 |
| video-question-answering-on-tvbench | GPT4o 8 frames | Average Accuracy: 39.9 |
| visual-question-answering-vqa-on-vlm2-bench | GPT-4o | Average Score on VLM2-bench (9 subtasks): 60.36 GC-mat: 37.45 GC-trk: 39.27 OC-cnt: 80.62 OC-cpr: 74.17 OC-grp: 57.50 PC-VID: 66.75 PC-cnt: 90.50 PC-cpr: 50.00 PC-grp: 47.00 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.