HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning

Haoxin Lin Yu-Yan Xu Yihao Sun Zhilong Zhang Yi-Chen Li Chengxing Jia Junyin Ye Jiaji Zhang Yang Yu

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning

Abstract

Model-based methods in reinforcement learning offer a promising approach to enhance data efficiency by facilitating policy exploration within a dynamics model. However, accurately predicting sequential steps in the dynamics model remains a challenge due to the bootstrapping prediction, which attributes the next state to the prediction of the current state. This leads to accumulated errors during model roll-out. In this paper, we propose the Any-step Dynamics Model (ADM) to mitigate the compounding error by reducing bootstrapping prediction to direct prediction. ADM allows for the use of variable-length plans as inputs for predicting future states without frequent bootstrapping. We design two algorithms, ADMPO-ON and ADMPO-OFF, which apply ADM in online and offline model-based frameworks, respectively. In the online setting, ADMPO-ON demonstrates improved sample efficiency compared to previous state-of-the-art methods. In the offline setting, ADMPO-OFF not only demonstrates superior performance compared to recent state-of-the-art offline approaches but also offers better quantification of model uncertainty using only a single ADM.

Code Repositories

HxLyn3/ADMPO
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
offline-rl-on-d4rlADMPO
Average Reward: 81

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning | Papers | HyperAI