Command Palette
Search for a command to run...

Abstract
While the deep reasoning'' paradigm has spurred significant advances inverifiable domains like mathematics, its application to open-ended, creativegeneration remains a critical challenge. The two dominant methods forinstilling reasoning -- reinforcement learning (RL) and instructiondistillation -- falter in this area; RL struggles with the absence of clearreward signals and high-quality reward models, while distillation isprohibitively expensive and capped by the teacher model's capabilities. Toovercome these limitations, we introduce REverse-Engineered Reasoning (REER), anew paradigm that fundamentally shifts the approach. Instead of building areasoning processforwards'' through trial-and-error or imitation, REER works``backwards'' from known-good solutions to computationally discover the latent,step-by-step deep reasoning process that could have produced them. Using thisscalable, gradient-free approach, we curate and open-source DeepWriting-20K, alarge-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks.Our model, DeepWriter-8B, trained on this data, not only surpasses strongopen-source baselines but also achieves performance competitive with, and attimes superior to, leading proprietary models like GPT-4o and Claude 3.5.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.