HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

Michael J. Ryan Tarek Naous Wei Xu

Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

Abstract

Recent advancements in high-quality, large-scale English resources have pushed the frontier of English Automatic Text Simplification (ATS) research. However, less work has been done on multilingual text simplification due to the lack of a diverse evaluation benchmark that covers complex-simple sentence pairs in many languages. This paper introduces the MultiSim benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs. This benchmark will encourage research in developing more effective multilingual text simplification models and evaluation metrics. Our experiments using MultiSim with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings. We observe strong performance from Russian in zero-shot cross-lingual transfer to low-resource languages. We further show that few-shot prompting with BLOOM-176b achieves comparable quality to reference simplifications outperforming fine-tuned models in most languages. We validate these findings through human evaluation.

Code Repositories

xenonmolecule/multisim
Official
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
text-simplification-on-wikilargefrmT5 (fine-tuned on MULTI-SIM)
SARI: 39.23

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Revisiting non-English Text Simplification: A Unified Multilingual Benchmark | Papers | HyperAI