HyperAIHyperAI

Command Palette

Search for a command to run...

Peak-Return Greedy Slicing

Peak-Return Greedy Slicing (PRGS) is an algorithmic framework jointly proposed by research teams from Shandong University, the Chinese Academy of Sciences, Li Auto, Tsinghua University, and other institutions. Related research findings have been published in [paper name missing]. Peak-Return Greedy Slicing: Subtrajectory Selection for Transformer-Based Offline RLIt has been accepted by ICLR 2026.

PRGS aims to significantly enhance the experience stitching and reorganization capabilities of Transformer-based offline reinforcement learning (Offline RL) models through explicit trajectory partitioning at the time-step level. Addressing the limitation of existing methods that often rely solely on the complete trajectory and final reward, making it difficult to distinguish between superior and inferior segments within long trajectories, this framework employs three core mechanisms (MMD-based reward estimation, greedy slicing policy, and adaptive history truncation) to explicitly partition and extract high-quality sub-trajectories for policy training at the time-step level. Experiments show that PRGS significantly enhances the model's ability to stitch together high-reward experiences, achieving an average performance improvement of 15.81 TP3T compared to the original baseline algorithm in multiple complex environment benchmarks.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp