Search for a command to run...
GVPO: Group Variance Policy Optimization for Large Language Model Post-Training