4 months ago

Counterfactual Multi-Agent Policy Gradients

Jakob Foerster; Gregory Farquhar; Triantafyllos Afouras; Nantas Nardelli; Shimon Whiteson

Abstract

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

Code Repositories

opendilab/DI-engine/blob/main/ding/policy/coma.py

pytorch

hanhanAnderson/LSF-SAC

pytorch

Mentioned in GitHub

puyuan1996/MARL

pytorch

Mentioned in GitHub

TonghanWang/NDQ

pytorch

Mentioned in GitHub

gingkg/smac

pytorch

Mentioned in GitHub

nice-hku/cl2marl-smac

pytorch

Mentioned in GitHub

matteokarldonati/Counterfactual-Multi-Agent-Policy-Gradients

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
smac-on-smac-def-armored-parallel	COMA	Median Win Rate: 0.0
smac-on-smac-def-armored-sequential	COMA	Median Win Rate: 0.0
smac-on-smac-def-infantry-parallel	COMA	Median Win Rate: 50.0
smac-on-smac-def-infantry-sequential	COMA	Median Win Rate: 28.1
smac-on-smac-def-outnumbered-parallel	COMA	Median Win Rate: 0.0
smac-on-smac-def-outnumbered-sequential	COMA	Median Win Rate: 0.0
smac-on-smac-off-complicated-parallel	COMA	Median Win Rate: 0.0
smac-on-smac-off-complicated-sequential	COMA	Median Win Rate: 0.0
smac-on-smac-off-distant-parallel	COMA	Median Win Rate: 0.0
smac-on-smac-off-distant-sequential	COMA	Median Win Rate: 0.0
smac-on-smac-off-hard-parallel	COMA	Median Win Rate: 0.0
smac-on-smac-off-hard-sequential	COMA	Median Win Rate: 0.0
smac-on-smac-off-near-parallel	COMA	Median Win Rate: 20.0
smac-on-smac-off-near-sequential	COMA	Median Win Rate: 0.0
smac-on-smac-off-superhard-parallel	COMA	Median Win Rate: 0.0
smac-on-smac-off-superhard-sequential	COMA	Median Win Rate: 0.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Counterfactual Multi-Agent Policy Gradients

Jakob Foerster; Gregory Farquhar; Triantafyllos Afouras; Nantas Nardelli; Shimon Whiteson

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters