
摘要
我们探讨了多智能体领域中的深度强化学习方法。首先,分析了传统算法在多智能体情况下的困难:Q-学习面临着环境固有的非平稳性挑战,而策略梯度方法则随着智能体数量的增加而面临方差增大的问题。接着,我们提出了一种适应性的演员-评论家(actor-critic)方法,该方法考虑了其他智能体的动作策略,并能够成功学习需要复杂多智能体协调的策略。此外,我们引入了一种训练方案,即为每个智能体使用一组策略集合(ensemble of policies),从而导致更加稳健的多智能体策略。我们在合作场景和竞争场景中展示了我们的方法相对于现有方法的优势,其中智能体群体能够发现各种物理和信息协调策略。
代码仓库
tkarr21/multagent-particle-envs
GitHub 中提及
darshil333/CSE574
GitHub 中提及
qi-pang/mdpfuzz
GitHub 中提及
bonniesjli/MADDPG_Tennis
pytorch
GitHub 中提及
gingkg/multiagent-particle-envs
GitHub 中提及
jyqhahah/rl_maddpg_matd3
pytorch
GitHub 中提及
NeuroCSUT/intentions
tf
GitHub 中提及
bonniesjli/MADDPG_Tennis_UnityML
pytorch
GitHub 中提及
krasing/DRLearningCollaboration
pytorch
GitHub 中提及
rainandwind1/MERL
pytorch
GitHub 中提及
shariqiqbal2810/maddpg-pytorch
pytorch
GitHub 中提及
cyoon1729/Multi-agent-reinforcement-learning
pytorch
GitHub 中提及
wsjeon/multiagent-particle-envs-maac
GitHub 中提及
openai/maddpg
tf
GitHub 中提及
goldbattle/snakes_mal
tf
GitHub 中提及
marwanihab/RL_Testing_Noise_ASRN
tf
GitHub 中提及
sliao-mi-luku/DeepRL-multiple-agents-tennis-udacity-drlnd-p3
pytorch
GitHub 中提及
AleXander-Tsui/MPE
GitHub 中提及
jingdic/rgmcomm
pytorch
GitHub 中提及
LXYYY/multiagent-particle-envs
GitHub 中提及
semitable/multiagent-particle-envs
GitHub 中提及
zowiezhang/stas
pytorch
GitHub 中提及
karishnu/tf-agents-multi-particle-envs
tf
GitHub 中提及
openai/multiagent-particle-envs
官方
GitHub 中提及
dtabas/multiagent-particle-envs
GitHub 中提及
MrDaubinet/collaboration-and-competition
pytorch
GitHub 中提及
RL-WFU/multi_agent_attack
tf
GitHub 中提及
Yutongamber/MADDPG
pytorch
GitHub 中提及
zoeyuchao/MPE-pytorch
pytorch
GitHub 中提及
biemann/Collaboration-and-Competition
pytorch
GitHub 中提及
baicenxiao/shaping-advice
tf
GitHub 中提及
marwanihab/RL_Tag_Game
pytorch
GitHub 中提及
petsol/MultiAgentCooperation_UnityAgent_MADDPG_Udacity
pytorch
GitHub 中提及
Zorrorulz/MultiAgentDDPG-Tennis
GitHub 中提及
kargarisaac/macrpo
pytorch
GitHub 中提及
mauricemager/multiagent-robot
tf
GitHub 中提及
hepengli/multiagent-particle-envs
tf
GitHub 中提及
zoeyuchao/MPEnew-pytorch
pytorch
GitHub 中提及
JinTanda/MADDPG_env
GitHub 中提及
cyanrain7/trpo-in-marl
pytorch
GitHub 中提及
baradist/multiagent-particle-envs
pytorch
GitHub 中提及
Abdelhamid-bouzid/Multi-Agent-Deep-Deterministic-Policy-Gradient
pytorch
GitHub 中提及
yannbouteiller/gym-airsimdroneracinglab
GitHub 中提及
xuehy/pytorch-maddpg
tf
GitHub 中提及
lachisis/multiagent-particle-envs
GitHub 中提及
baoqianwang/iros22_darl1n
tf
GitHub 中提及
SintolRTOS/multi-agent_Example
GitHub 中提及
caslab-vt/SARNet
tf
GitHub 中提及
google/maddpg-replication
tf
GitHub 中提及
JohannesAck/MATD3implementation
tf
GitHub 中提及
isp1tze/MAProj
pytorch
GitHub 中提及
Stippler/cow-simulator
pytorch
GitHub 中提及
tkarr21/multiagent-particle-envs
GitHub 中提及
ksajan/DDPG-MAPE
tf
GitHub 中提及
Steven-Ho/multiagent-particle-envs
GitHub 中提及
anonymous-iclr22/trust-region-in-multi-agent-reinforcement-learning
pytorch
GitHub 中提及
rallen10/multiagent-particle-envs
GitHub 中提及
debajit15kgp/multiagent-envs
GitHub 中提及
jansenkeith501/CS295-MADDPG
GitHub 中提及
Chan1998/MAAC
pytorch
GitHub 中提及
philtabor/Multi-Agent-Deep-Deterministic-Policy-Gradients
pytorch
GitHub 中提及
jiayu-ch15/MPE-for-curriculum-learning
GitHub 中提及
johannesharmse/multi_agent_RL
GitHub 中提及
shariqiqbal2810/multiagent-particle-envs
GitHub 中提及
EyaRhouma/collaboration-competition-MADDPG
pytorch
GitHub 中提及
rainandwind1/MADDPG-reconstruct
pytorch
GitHub 中提及
wsjeon/multiagent-particle-envs-v2
GitHub 中提及
morning9393/HAPPO-HATRPO
pytorch
GitHub 中提及
starry-sky6688/MADDPG
pytorch
GitHub 中提及
quantumiracle/mars
pytorch
GitHub 中提及
JohannesAck/tf2multiagentrl
tf
GitHub 中提及
rainandwind1/Maddpg_multiagent
pytorch
GitHub 中提及
raoshashank/Tennis-with-MADDPG
pytorch
GitHub 中提及
pr-shukla/maddpg-keras
tf
GitHub 中提及
schroederdewitt/multiagent-particle-envs
GitHub 中提及
SihongHo/multiagent-particle-envs
GitHub 中提及
thechrisyoon08/Multi-agent-reinforcement-learning
pytorch
GitHub 中提及
biorobotics/PRD_environments
GitHub 中提及
madhur-tandon/RL-Project
pytorch
GitHub 中提及
facebookresearch/benchmarl
pytorch
GitHub 中提及
Ah31/maddpg_pytorch
pytorch
GitHub 中提及
thechrisyoon08/marl
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| smac-on-smac-def-armored-sequential | MADDPG | Median Win Rate: 90.6 |
| smac-on-smac-def-infantry-sequential | MADDPG | Median Win Rate: 100 |
| smac-on-smac-def-outnumbered-sequential | MADDPG | Median Win Rate: 81.3 |
| smac-on-smac-off-complicated-sequential | MADDPG | Median Win Rate: 0.0 |
| smac-on-smac-off-distant-sequential | MADDPG | Median Win Rate: 0.0 |
| smac-on-smac-off-hard-sequential | MADDPG | Median Win Rate: 0.0 |
| smac-on-smac-off-near-sequential | MADDPG | Median Win Rate: 75.0 |
| smac-on-smac-off-superhard-sequential | MADDPG | Median Win Rate: 0.0 |