HyperAIHyperAI

Command Palette

Search for a command to run...

Off Policy

Date

7 years ago

Different strategiesIt means that the strategy for generating new samples is different from the strategy used when the network updates parameters. A typical example is the Q-learning algorithm.

Different strategy thinking

Different strategies refer to that the learned strategy is different from the sampled strategy. It first generates a large amount of behavior data under a certain probability distribution, and then finds the target strategy from these data that deviate from the Off optimal strategy.

The adoption of this plan requires the following conditions to be met: assuming that π is the target strategy and μ is the behavioral strategy, then the condition for learning from μ to π is that when π ( a | s ) > 0, µ ( a | s ) > 0 must hold.

Q-learning algorithm

The Q-Learning algorithm learns how to choose the next action based on perceived rewards and penalties, where Q represents the quality function of the policy π, which maps each state-action pair (s, a) to the total expected future reward after observing the state s and determining the action a.

The Q-Learning algorithm is Model-Free, which means that it does not model the dynamic knowledge of the MDP, but directly estimates the Q values of different actions in each state, and then selects the action with the highest Q value in each state and the corresponding strategy.

If the computer continuously accesses all state actions, the Q-Learning algorithm will converge to the optimal Q function.

Different strategy advantages

  • Can learn based on teaching samples given by humans or guided samples given by other agents;
  • Experience generated from old strategies can be used;
  • It is possible to learn a deterministic policy while using an exploratory policy;
  • You can use one strategy to sample and learn multiple strategies at the same time.
Related terms: same strategy, strategy function

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Off Policy | Wiki | HyperAI