Search for a command to run...
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex