Date

2 years ago

Same strategyIt means that the strategy for generating samples is the same as the network parameter update strategy. It directly performs the next action selection based on the current strategy, and then uses this sample to update the strategy. The strategy for generating samples is the same as the strategy during learning.

SARSA algorithm

SARSA (State-Action-Reward-State-Action) is an algorithm for learning Markov decision process strategies, which is often used in reinforcement learning in the field of machine learning.

Key points of SARSA algorithm

When in state s', you know which a' to take and take that action;
The selection of action a follows the e-greedy strategy, and the calculation of the target Q value is based on the action a' obtained by the e-greedy strategy, so it is on-policy learning.

Advantages and disadvantages of the same strategy

Advantages: Each step can be updated, which is obvious, and the learning speed is fast; it can face scenarios with no results and has a wide range of applications.
Disadvantages: Encountering the contradiction between exploration and utilization; only using the known optimal choice may not lead to learning the optimal solution; converging to the local optimum, adding exploration and reducing learning efficiency.

Same strategy and different strategies

The difference between the same strategy and different strategies is whether to use the established strategy or a new strategy when updating the Q value.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Date

2 years ago

SARSA algorithm

SARSA (State-Action-Reward-State-Action) is an algorithm for learning Markov decision process strategies, which is often used in reinforcement learning in the field of machine learning.

Key points of SARSA algorithm

When in state s', you know which a' to take and take that action;
The selection of action a follows the e-greedy strategy, and the calculation of the target Q value is based on the action a' obtained by the e-greedy strategy, so it is on-policy learning.

Advantages and disadvantages of the same strategy

Advantages: Each step can be updated, which is obvious, and the learning speed is fast; it can face scenarios with no results and has a wide range of applications.
Disadvantages: Encountering the contradiction between exploration and utilization; only using the known optimal choice may not lead to learning the optimal solution; converging to the local optimum, adding exploration and reducing learning efficiency.

Same strategy and different strategies

The difference between the same strategy and different strategies is whether to use the established strategy or a new strategy when updating the Q value.

Visual Language Action Model (VLA)

VLA can generate robot movements directly based on visual images and verbal commands.

a month ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Date

2 years ago

SARSA algorithm

SARSA (State-Action-Reward-State-Action) is an algorithm for learning Markov decision process strategies, which is often used in reinforcement learning in the field of machine learning.

Key points of SARSA algorithm

When in state s', you know which a' to take and take that action;
The selection of action a follows the e-greedy strategy, and the calculation of the target Q value is based on the action a' obtained by the e-greedy strategy, so it is on-policy learning.

Advantages and disadvantages of the same strategy

Advantages: Each step can be updated, which is obvious, and the learning speed is fast; it can face scenarios with no results and has a wide range of applications.
Disadvantages: Encountering the contradiction between exploration and utilization; only using the known optimal choice may not lead to learning the optimal solution; converging to the local optimum, adding exploration and reducing learning efficiency.

Same strategy and different strategies

The difference between the same strategy and different strategies is whether to use the established strategy or a new strategy when updating the Q value.

Visual Language Action Model (VLA)

VLA can generate robot movements directly based on visual images and verbal commands.

a month ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

On Policy

SARSA algorithm

Key points of SARSA algorithm

Advantages and disadvantages of the same strategy

Same strategy and different strategies

Build AI with AI

HyperAI Newsletters

Command Palette

On Policy

SARSA algorithm

Key points of SARSA algorithm

Advantages and disadvantages of the same strategy

Same strategy and different strategies

Visual Language Action Model (VLA)

Build AI with AI

HyperAI Newsletters

Command Palette

On Policy

SARSA algorithm

Key points of SARSA algorithm

Advantages and disadvantages of the same strategy

Same strategy and different strategies

Visual Language Action Model (VLA)

Build AI with AI

HyperAI Newsletters

Visual Language Action Model (VLA)

Visual Language Action Model (VLA)