Date

3 hours ago

Organization

Paper URL

openreview.net

Tags

Reinforcement Learning

Machine Learning

Algorithm and Model Framework

Peak-Return Greedy Slicing (PRGS) is an algorithmic framework jointly proposed by research teams from Shandong University, the Chinese Academy of Sciences, Li Auto, Tsinghua University, and other institutions. Related research findings have been published in [paper name missing]. Peak-Return Greedy Slicing: Subtrajectory Selection for Transformer-Based Offline RLIt has been accepted by ICLR 2026.

PRGS aims to significantly enhance the experience stitching and reorganization capabilities of Transformer-based offline reinforcement learning (Offline RL) models through explicit trajectory partitioning at the time-step level. Addressing the limitation of existing methods that often rely solely on the complete trajectory and final reward, making it difficult to distinguish between superior and inferior segments within long trajectories, this framework employs three core mechanisms (MMD-based reward estimation, greedy slicing policy, and adaptive history truncation) to explicitly partition and extract high-quality sub-trajectories for policy training at the time-step level. Experiments show that PRGS significantly enhances the model's ability to stitch together high-reward experiences, achieving an average performance improvement of 15.81 TP3T compared to the original baseline algorithm in multiple complex environment benchmarks.

Related Wiki

Theory of Space

Spatial theory refers to the framework of an intelligent agent’s ability to construct, update and utilize spatial beliefs in an environment of incomplete information through active exploration.

in 5 hours

Dense Retriever

The dense search engine is responsible for quickly finding the paragraphs most relevant to the query semantics from a massive document library, and is the core foundational component of the search enhancement generation system.

in 5 hours

Mean Speed Strategy (MVP)

MVP achieves single-step action generation with both high expressive power and extremely fast computation by modeling the average velocity field.

9 days ago

Guided Thought Reinforcement

GTR can guide model reasoning in complex visual environments and prevent "brain breakdown".

in 5 hours

Safety Comparison Method: Deep Aligned Visual Safety Prompt

It effectively solves the key challenges in LVLM secure alignment.

21 days ago

iSeal Fingerprint Recognition Method

iSeal achieves a 100% fingerprint success rate (FSR) against more than 10 attacks on 12 LLMs.

21 days ago

Model Souping

Model Souping can generate a better model by averaging the weights of multiple fine-tunings.

14 days ago

SoCE Class Expert Soup

SoCE is a model optimization paradigm based on an automatic category-aware expert selection mechanism and combined with multiple benchmark tasks.

21 days ago

WorldGen

WorldGen is capable of creating geometrically unified, visually rich, and highly efficient real-time rendering worlds.

14 days ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Date

3 hours ago

Organization

Paper URL

openreview.net

Related Wiki

Theory of Space

Spatial theory refers to the framework of an intelligent agent’s ability to construct, update and utilize spatial beliefs in an environment of incomplete information through active exploration.

in 5 hours

Dense Retriever

in 5 hours

Mean Speed Strategy (MVP)

MVP achieves single-step action generation with both high expressive power and extremely fast computation by modeling the average velocity field.

9 days ago

Guided Thought Reinforcement

GTR can guide model reasoning in complex visual environments and prevent "brain breakdown".

in 5 hours

Safety Comparison Method: Deep Aligned Visual Safety Prompt

It effectively solves the key challenges in LVLM secure alignment.

21 days ago

iSeal Fingerprint Recognition Method

iSeal achieves a 100% fingerprint success rate (FSR) against more than 10 attacks on 12 LLMs.

21 days ago

Model Souping

Model Souping can generate a better model by averaging the weights of multiple fine-tunings.

14 days ago

SoCE Class Expert Soup

SoCE is a model optimization paradigm based on an automatic category-aware expert selection mechanism and combined with multiple benchmark tasks.

21 days ago

WorldGen

WorldGen is capable of creating geometrically unified, visually rich, and highly efficient real-time rendering worlds.

14 days ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Peak-Return Greedy Slicing

Build AI with AI

HyperAI Newsletters

Command Palette

Peak-Return Greedy Slicing

Related Wiki

Theory of Space

Dense Retriever

Mean Speed Strategy (MVP)

Guided Thought Reinforcement

Safety Comparison Method: Deep Aligned Visual Safety Prompt

iSeal Fingerprint Recognition Method

Model Souping

SoCE Class Expert Soup

WorldGen

Build AI with AI

HyperAI Newsletters

Command Palette

Peak-Return Greedy Slicing

Related Wiki

Theory of Space

Dense Retriever

Mean Speed Strategy (MVP)

Guided Thought Reinforcement

Safety Comparison Method: Deep Aligned Visual Safety Prompt

iSeal Fingerprint Recognition Method

Model Souping

SoCE Class Expert Soup

WorldGen

Build AI with AI

HyperAI Newsletters

Related Wiki

Theory of Space

Dense Retriever

Mean Speed Strategy (MVP)

Guided Thought Reinforcement

Safety Comparison Method: Deep Aligned Visual Safety Prompt

iSeal Fingerprint Recognition Method

Model Souping

SoCE Class Expert Soup

WorldGen

Related Wiki

Theory of Space

Dense Retriever

Mean Speed Strategy (MVP)

Guided Thought Reinforcement

Safety Comparison Method: Deep Aligned Visual Safety Prompt

iSeal Fingerprint Recognition Method

Model Souping

SoCE Class Expert Soup

WorldGen