HyperAIHyperAI

Command Palette

Search for a command to run...

GOT-OCR-2.0 The world's First Universal end-to-end OCR Model

Date

a year ago

Size

743.26 MB

Tags

Paper URL

2409.01704

Project Introduction

GOT-OCR-2.0 This is a unified end-to-end model based on the General OCR Theory, focusing on improving the accuracy and efficiency of optical character recognition (OCR). The project was jointly released by research teams from StepFun, Megvii Technology, the University of Chinese Academy of Sciences, and Tsinghua University, and the related papers are as follows. General OCR Theory: Towards OCR-2.0 via a Unified End-to-end ModelIt is suitable for various application scenarios such as scene text recognition and document recognition. It adopts an integrated architecture, enabling it to efficiently handle the diversity and complexity of text. GOT-OCR 2.0 not only supports scene text recognition but can also handle multi-page documents, bringing greater flexibility to the OCR field.

GOT-OCR-2.0  Features include:

  • Strong versatility: Based on general OCR theory, it can process scene text and complex document structures such as tables and formulas.
  • End-to-end model: The unified end-to-end architecture simplifies the entire OCR process, integrating image input to text output.
  • Efficient performance: Integrated Flash-Attention technology improves recognition speed and performance.
  • Multi-platform support: supports CUDA acceleration and is integrated with the GOT-OCR2.0 platform to load pre-trained models.
  • Widely used: Suitable for a wide range of application scenarios such as multi-page documents and scene texts.

Effect examples


Run steps

1. Click "Clone" in the upper right corner of the project, and then click "Next" to complete: Basic Information > Select Computing Power > Review. Finally, click "Continue" to open this project in your personal container.

2. After the resource allocation is completed, the background will automatically initialize the model (), and then you can directly use the API address provided by the platform to access the operation page (real-name authentication must have been completed, and there is no need to open the workspace for this step)

3. Upload the target image

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp