HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
文档图像分类
Document Image Classification On Rvl Cdip
Document Image Classification On Rvl Cdip
评估指标
Accuracy
Parameters
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Parameters
Paper Title
Repository
EAML
97.70%
-
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
-
Cross-Modal
97.05%
197M
Visual and Textual Deep Feature Fusion for Document Image Classification
-
DocFormerBASE
96.17%
183M
DocFormer: End-to-End Transformer for Document Understanding
LayoutLMV3Large
95.93%
368M
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
LiLT[EN-R]BASE
95.68%
-
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LayoutLMv2LARGE
95.64%
-
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
TILT-Large
95.52%
-
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
DocFormer large
95.50%
536M
DocFormer: End-to-End Transformer for Document Understanding
LayoutLMv3BASE
95.44%
133M
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Donut
95.3%
-
OCR-free Document Understanding Transformer
TILT-Base
95.25%
-
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
LayoutLMv2BASE
95.25%
200M
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
LayoutXLM
95.21%
-
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
StrucTexTv2 (large)
94.62%
238M
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Pre-trained LayoutLM
94.42%
160M
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
DoPTA
94.12%
85M
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment
-
DocXClassifier-B
94.00%
95.4M
DocXClassifier: High Performance Explainable Deep Network for Document Image Classification
-
StrucTexTv2 (small)
93.4%
28M
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
VLCDoC
93.19%
217M
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
-
TransferDoc
93.18%
221M
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification
-
0 of 31 row(s) selected.
Previous
Next
Document Image Classification On Rvl Cdip | SOTA | HyperAI超神经