Command Palette
Search for a command to run...
GenExam multi-disciplinary Literary and Graphic Examination Benchmark Dataset
Date
Size
Paper URL
License
MIT
*This dataset supports online use.Click here to jump.
GenExam is the first multidisciplinary text-to-image exam-style benchmark dataset released in 2025 by Shanghai Artificial Intelligence Laboratory, Shanghai Jiao Tong University, Tsinghua University and other institutions. The related paper results are "GenExam: A Multidisciplinary Text-to-Image Exam", which aims to test whether the model can integrate understanding, reasoning and generation capabilities to truly solve drawing problems.
This dataset contains approximately 1,000 high-quality examples across 10 disciplines: mathematics, physics, chemistry, biology, computer science, engineering, medicine, art, geography, and history. Each example includes diverse and challenging prompts, corresponding ground-truth images, and fine-grained scoring points, fully reflecting the rigor and difficulty of real-world exams. The dataset was constructed in four stages: starting with approximately 40,000 images, automatically screened and prompted by GPT-5, and rigorously reviewed by doctoral-level experts, ultimately resulting in the aforementioned 1,000 multidisciplinary examples.

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.