HyperAIHyperAI

Command Palette

Search for a command to run...

OCRBench-v2 Text Recognition Benchmark Dataset

Date

24 days ago

Size

6.43 GB

Organization

Bytedance
Huazhong University of Science and Technology
South China University of Technology

Paper URL

2501.00321

*This dataset supports online use.Click here to jump.

OCRBench-v2 is a multimodal large-scale model optical character recognition (OCR) evaluation benchmark released in 2025 by Huazhong University of Science and Technology, South China University of Technology, ByteDance and other institutions. The relevant paper results are "OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning", which aims to evaluate the OCR capabilities of large multimodal models (LMMs) in different text-related tasks.

This dataset is a large-scale upgrade based on OCRBench. It includes 10,000 manually verified Chinese-English question-and-answer pairs as a public test set, and an additional private test set consisting of 1,500 manually annotated rich text images from a variety of sources, including print books, e-books, scanned documents, and web content. The data covers 31 typical text scenarios and 23 subtasks, categorized into eight core OCR functions (text recognition, text detection, text reference location, relationship extraction, element parsing, mathematical operations, visual-text understanding, and knowledge reasoning).

OCRBenchv2.torrent
Seeding 1Downloading 0Completed 6Total Downloads 29
  • OCRBenchv2/
    • README.md
      1.81 KB
    • README.txt
      3.62 KB
      • data/
        • OCRBenchv2.zip
          6.43 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
OCRBench-v2 Text Recognition Benchmark Dataset | Datasets | HyperAI