Home Console Docs News Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

4 months ago

WikiHow: A Large Scale Text Summarization Dataset

View Paper Details

Mahnaz Koupaee; William Yang Wang

WikiHow: A Large Scale Text Summarization Dataset

Abstract

Sequence-to-sequence models have recently gained the state of the art performance in summarization. However, not too many large-scale high-quality datasets are available and almost all the available ones are mainly news articles with specific writing style. Moreover, abstractive human-style systems involving description of the content at a deeper level require data with higher levels of abstraction. In this paper, we present WikiHow, a dataset of more than 230,000 article and summary pairs extracted and constructed from an online knowledge base written by different human authors. The articles span a wide range of topics and therefore represent high diversity styles. We evaluate the performance of the existing methods on WikiHow to present its challenges and set some baselines to further improve it.

Code Repositories

LubdaMax/Data-Science-1

tf

Mentioned in GitHub

anbunathan/WikiHow-Semantic

Mentioned in GitHub

ThierryBarros/reproduction-textrank

Mentioned in GitHub

Wikidepia/indonesia_dataset

Mentioned in GitHub

dengyang17/wikihowQA

Mentioned in GitHub

wikidepia/indonesian_datasets

Mentioned in GitHub

stancld/GeneratingHeadlines_GANs

pytorch

Mentioned in GitHub

pvl/wikihow_pairs_dataset

Mentioned in GitHub

stancld/GeneratingHeadline_GANs

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
text-summarization-on-wikihow	Pointer-generator + coverage	ROUGE-1: 28.53 ROUGE-2: 9.23 ROUGE-L: 26.54

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

WikiHow: A Large Scale Text Summarization Dataset | Papers | HyperAI