HyperAIHyperAI

Command Palette

Search for a command to run...

Proximal Policy Optimization

Date

2 years ago

Proximal Policy Optimization (PPO) is an algorithm in the field of reinforcement learning that is used to train the decision-making functions of computer agents to complete difficult tasks. PPO was developed by John Schulman in 2017 and has become the default reinforcement learning algorithm of the American artificial intelligence company OpenAI. In 2018, PPO achieved various successes, such as controlling a robotic arm, beating professional players in Dota 2, and performing well in Atari games. Many experts call PPO the state-of-the-art because it strikes a good balance between performance and understanding. Compared with other algorithms, the three main advantages of PPO are simplicity, stability, and sample efficiency.

Advantages of PPO

  • Simplicity: PPO approximates what TRPO does without doing much computation. It uses first-order optimization (clipping function) to constrain the policy update, while TRPO uses KL divergence constraints outside the objective function (second-order optimization). The PPO method is relatively easy to implement and takes less computation time than the TRPO method. Therefore, it is cheaper and more efficient to use PPO in large-scale problems.
  • stability:While other reinforcement learning algorithms require hyperparameter tuning, PPO does not necessarily require hyperparameter tuning (epsilon 0.2 can be used in most cases). In addition, PPO does not require complex optimization techniques. It can be easily trained using standard deep learning frameworks and generalizes to a wide range of tasks.
  • Sample efficiency:Sample efficiency indicates whether an algorithm requires more or less data to train a good policy. PPO achieves sample efficiency due to the use of a surrogate objective. The surrogate objective enables PPO to avoid new policies that vary too much from the old ones; the clip function regularizes policy updates and reuses training data. Sample efficiency is particularly useful for complex and high-dimensional tasks, where data collection and computation can be expensive.

References

【1】https://en.wikipedia.org/wiki/Proximal_Policy_Optimization

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Proximal Policy Optimization | Wiki | HyperAI