HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Pre-training Meets Clustering: A Hybrid Extractive Multi-document Summarization Model

{Seba Susan Akanksha Karotia}

Abstract

In this era where a large amount of information has flooded the Internet, manual extraction and consumption of relevant information is very difficult and time-consuming. Therefore, an automated document summarization tool is necessary to excerpt important information from a set of documents that have similar or related subjects. Multi-document summarization allows retrieval of important and relevant content from multiple documents while minimizing redundancy. A multi-document text summarization system is developed in this study using an unsupervised extractive-based approach. The proposed model is a fusion of two learning paradigms: the T5 pre-trained transformer model and the K-Means clustering algorithm. We perform the experiments on the benchmark news article corpus Document Understanding Conference (DUC2004). The ROUGE evaluation metrics were used to estimate the performance of the proposed approach on the DUC2004. Outcomes validate that our proposed model shows greatly enhanced performance as compared to the existent unsupervised state-of-the-art approaches.

Benchmarks

BenchmarkMethodologyMetrics
extractive-text-summarization-on-duc-2004-1Pre-training-meets-Clustering-A-Hybrid-Extractive-Multi-Document-Summarization-Model
Test ROGUE-1: 34.013
Test ROGUE-2: 8.266

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp