HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Visual and Textual Deep Feature Fusion for Document Image Classification

{Marçal Rusiñol Mickael Coustaty Ziheng Ming Souhail Bakkali}

Visual and Textual Deep Feature Fusion for Document Image Classification

Abstract

The topic of text document image classification has been explored extensively over the past few years. Most recent approaches handled this task by jointly learning the visual features of document images and their corresponding textual contents. Due to the various structures of document images, the extraction of semantic information from its textual content is beneficial for document image processing tasks such as document retrieval, information extraction, and text classification. In this work, a two-stream neural architecture is proposed to perform the document image classification task. We conduct an exhaustive investigation of nowadays widely used neural networks as well as word embedding procedures used as backbones, in order to extract both visual and textual features from document images. Moreover, a joint feature learning approach that combines image features and text embeddings is introduced as a late fusion methodology. Both the theoretical analysis and the experimental results demonstrate the superiority of our proposed joint feature learning method comparatively to the single modalities. This joint learning approach outperforms the state-of-the-art results with a classification accuracy of 97.05% on the large-scale RVL-CDIP dataset.

Benchmarks

BenchmarkMethodologyMetrics
document-image-classification-on-rvl-cdipCross-Modal
Accuracy: 97.05%
Parameters: 197M

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Visual and Textual Deep Feature Fusion for Document Image Classification | Papers | HyperAI