Command Palette
Search for a command to run...
Jakob Foerster; Gregory Farquhar; Triantafyllos Afouras; Nantas Nardelli; Shimon Whiteson

Abstract
Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| smac-on-smac-def-armored-parallel | COMA | Median Win Rate: 0.0 |
| smac-on-smac-def-armored-sequential | COMA | Median Win Rate: 0.0 |
| smac-on-smac-def-infantry-parallel | COMA | Median Win Rate: 50.0 |
| smac-on-smac-def-infantry-sequential | COMA | Median Win Rate: 28.1 |
| smac-on-smac-def-outnumbered-parallel | COMA | Median Win Rate: 0.0 |
| smac-on-smac-def-outnumbered-sequential | COMA | Median Win Rate: 0.0 |
| smac-on-smac-off-complicated-parallel | COMA | Median Win Rate: 0.0 |
| smac-on-smac-off-complicated-sequential | COMA | Median Win Rate: 0.0 |
| smac-on-smac-off-distant-parallel | COMA | Median Win Rate: 0.0 |
| smac-on-smac-off-distant-sequential | COMA | Median Win Rate: 0.0 |
| smac-on-smac-off-hard-parallel | COMA | Median Win Rate: 0.0 |
| smac-on-smac-off-hard-sequential | COMA | Median Win Rate: 0.0 |
| smac-on-smac-off-near-parallel | COMA | Median Win Rate: 20.0 |
| smac-on-smac-off-near-sequential | COMA | Median Win Rate: 0.0 |
| smac-on-smac-off-superhard-parallel | COMA | Median Win Rate: 0.0 |
| smac-on-smac-off-superhard-sequential | COMA | Median Win Rate: 0.0 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.