HyperAIHyperAI

Command Palette

Search for a command to run...

On Policy

Date

2 years ago

Same strategyIt means that the strategy for generating samples is the same as the network parameter update strategy. It directly performs the next action selection based on the current strategy, and then uses this sample to update the strategy. The strategy for generating samples is the same as the strategy during learning.

SARSA algorithm

SARSA (State-Action-Reward-State-Action) is an algorithm for learning Markov decision process strategies, which is often used in reinforcement learning in the field of machine learning.

Key points of SARSA algorithm

  • When in state s', you know which a' to take and take that action;
  • The selection of action a follows the e-greedy strategy, and the calculation of the target Q value is based on the action a' obtained by the e-greedy strategy, so it is on-policy learning.

Advantages and disadvantages of the same strategy

  • Advantages: Each step can be updated, which is obvious, and the learning speed is fast; it can face scenarios with no results and has a wide range of applications.
  • Disadvantages: Encountering the contradiction between exploration and utilization; only using the known optimal choice may not lead to learning the optimal solution; converging to the local optimum, adding exploration and reducing learning efficiency.

Same strategy and different strategies

The difference between the same strategy and different strategies is whether to use the established strategy or a new strategy when updating the Q value.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
On Policy | Wiki | HyperAI