HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

Yuhao Wu Yushi Bai Zhiqiang Hu Roy Ka-Wei Lee Juanzi Li

LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement
  Learning

Abstract

Ultra-long generation by large language models (LLMs) is a widely demandedscenario, yet it remains a significant challenge due to their maximumgeneration length limit and overall quality degradation as sequence lengthincreases. Previous approaches, exemplified by LongWriter, typically rely on''teaching'', which involves supervised fine-tuning (SFT) on syntheticlong-form outputs. However, this strategy heavily depends on synthetic SFTdata, which is difficult and costly to construct, often lacks coherence andconsistency, and tends to be overly artificial and structurally monotonous. Inthis work, we propose an incentivization-based approach that, starting entirelyfrom scratch and without relying on any annotated or synthetic data, leveragesreinforcement learning (RL) to foster the emergence of ultra-long, high-qualitytext generation capabilities in LLMs. We perform RL training starting from abase model, similar to R1-Zero, guiding it to engage in reasoning thatfacilitates planning and refinement during the writing process. To supportthis, we employ specialized reward models that steer the LLM towards improvedlength control, writing quality, and structural formatting. Experimentalevaluations show that our LongWriter-Zero model, trained from Qwen2.5-32B,consistently outperforms traditional SFT methods on long-form writing tasks,achieving state-of-the-art results across all metrics on WritingBench andArena-Write, and even surpassing 100B+ models such as DeepSeek R1 andQwen3-235B. We open-source our data and model checkpoints underhttps://huggingface.co/THU-KEG/LongWriter-Zero-32B

Code Repositories

thudm/longwriter
pytorch
Mentioned in GitHub

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning | Papers | HyperAI