HyperAI

Recently, a model called dots.ocr has been making waves in the field of OCR technology with its unique lightweight design and precise text extraction capabilities. dots.ocr is a multilingual document layout parsing model released by Xiaohongshu's hi lab in August 2025.The model is based on a 1.7 billion parameter visual language model (VLM) that can perform layout detection and content recognition in a unified manner.Whether it is a blurry scan, a tilted mobile phone snapshot, or a low-resolution screenshot, dots.ocr can accurately capture fragmented text information through adaptive noise reduction algorithms and dynamic segmentation technology.The micro-architecture with a model size of less than 2B enables industrial equipment, mobile terminals and even embedded systems to achieve millisecond-level real-time text recognition, completely eliminating cloud dependence..

More notably, dots.ocr breaks through the traditional OCR's reliance on structured documents. By integrating a multi-scale feature fusion mechanism with contextual semantic error correction, the model maintains a coherence and accuracy close to that of human reading when recognizing handwritten sloppy handwriting, dense tabular data, or mixed typesetting text. In addition,In terms of multilingual document processing, it supports 100 languages including Chinese and English, and can accurately identify and process text content and layout elements in multilingual documents.Whether dealing with multilingual documents or complex language environments, dots.ocr delivers stable and accurate parsing results. In benchmarks like OmniDocBench, dots.ocr's formula recognition performance rivals that of larger models like Doubao-1.5 and Gemini2.5-Pro. It demonstrates a significant advantage in parsing minority languages, truly achieving the goal of "small yet precise."

at present,dots.ocr: A multilingual document parsing modelIt has been uploaded to the "Tutorial" section of HyperAI's official website. Click the link below to deploy it with one click.

Tutorial Link:

https://go.hyper.ai/49mZU

Demo Run

1. On the hyper.ai homepage, select the Tutorials page, choose dots.ocr: Multilingual Document Parsing Model, and click Run this tutorial online.

2. After the page jumps, click "Clone" in the upper right corner to clone the tutorial into your own container.

3. Select "NVIDIA GeForce RTX 4090" and "PyTorch" images, select "Pay as you go" or "Daily/Weekly/Monthly Package" according to your needs, and click "Continue". New users can register using the invitation link below to get 4 hours of RTX 4090 + 5 hours of CPU free time!

HyperAI exclusive invitation link (copy and open in browser):

https://openbayes.com/console/signup?r=Ada0322_NR0n

4. Wait for resources to be allocated. The first cloning process will take approximately 3 minutes. When the status changes to "Running," click the arrow next to "API Address" to jump to the Demo page. Please note that users must complete real-name authentication before using the API address.

Effect Demonstration

Taking the "Parse" function as an example, I uploaded an English document, and the effect is as follows:

Whether it is a table or a formula, the model can do an excellent job of recognizing:

The above is the tutorial recommended by HyperAI this time. Everyone is welcome to come and experience it!

Tutorial Link:https://go.hyper.ai/49mZU

Get high-quality papers and in-depth interpretation articles in the field of AI4S from 2023 to 2024 with one click⬇️

Command Palette

Online Tutorial | Breaking Through the Reliance on Structured Documents, dots.ocr Achieves state-of-the-art OCR Performance in Hundreds of Languages Based on 1.7B parameters.

Demo Run

Effect Demonstration