HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

CoVA: Context-aware Visual Attention for Webpage Information Extraction

Anurendra Kumar; Keval Morabia; Jingjin Wang; Kevin Chen-Chuan Chang; Alexander Schwing

CoVA: Context-aware Visual Attention for Webpage Information Extraction

Abstract

Webpage information extraction (WIE) is an important step to create knowledge bases. For this, classical WIE methods leverage the Document Object Model (DOM) tree of a website. However, use of the DOM tree poses significant challenges as context and appearance are encoded in an abstract manner. To address this challenge we propose to reformulate WIE as a context-aware Webpage Object Detection task. Specifically, we develop a Context-aware Visual Attention-based (CoVA) detection pipeline which combines appearance features with syntactical structure from the DOM tree. To study the approach we collect a new large-scale dataset of e-commerce websites for which we manually annotate every web element with four labels: product price, product title, product image and background. On this dataset we show that the proposed CoVA approach is a new challenging baseline which improves upon prior state-of-the-art methods.

Code Repositories

kevalmorabia97/cova-web-object-detection
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
webpage-object-detection-on-covaCoVA++
Cross Domain Image Accuracy: 99.6
Cross Domain Price Accuracy: 96.1
Cross Domain Title Accuracy: 96.7
webpage-object-detection-on-covaCoVA
Cross Domain Image Accuracy: 98.8
Cross Domain Price Accuracy: 95.5
Cross Domain Title Accuracy: 95.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CoVA: Context-aware Visual Attention for Webpage Information Extraction | Papers | HyperAI