Document Image Classification On Rvl Cdip

评估指标

Accuracy
Parameters

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
EAML97.70%-EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification-
Cross-Modal97.05%197MVisual and Textual Deep Feature Fusion for Document Image Classification-
DocFormerBASE96.17%183MDocFormer: End-to-End Transformer for Document Understanding
LayoutLMV3Large95.93%368MLayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
LiLT[EN-R]BASE95.68%-LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LayoutLMv2LARGE95.64%-LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
TILT-Large95.52%-Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
DocFormer large95.50%536MDocFormer: End-to-End Transformer for Document Understanding
LayoutLMv3BASE95.44%133MLayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Donut95.3%-OCR-free Document Understanding Transformer
TILT-Base95.25%-Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
LayoutLMv2BASE95.25%200MLayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
LayoutXLM95.21%-LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
StrucTexTv2 (large)94.62%238MStrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Pre-trained LayoutLM94.42%160MLayoutLM: Pre-training of Text and Layout for Document Image Understanding
DoPTA94.12%85MDoPTA: Improving Document Layout Analysis using Patch-Text Alignment-
DocXClassifier-B94.00%95.4MDocXClassifier: High Performance Explainable Deep Network for Document Image Classification-
StrucTexTv2 (small)93.4%28MStrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
VLCDoC93.19%217MVLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification-
TransferDoc93.18%221MGlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification-
0 of 31 row(s) selected.
Document Image Classification On Rvl Cdip | SOTA | HyperAI超神经