HyperAIHyperAI

Command Palette

Search for a command to run...

Paper - SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization | Papers | HyperAI