Search for a command to run...
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models