3 个月前

用于多智能体强化学习中分布值函数分解的统一框架

用于多智能体强化学习中分布值函数分解的统一框架

摘要

在完全协作的多智能体强化学习(MARL)环境中,由于每个智能体的观测具有部分可观测性,且其他智能体的策略持续动态变化,环境表现出高度的随机性。为应对上述挑战,我们提出了一种统一框架——DFAC(Distributional Factorization of Action Values),该框架将分布式强化学习(Distributional RL)与价值函数分解方法相结合。该框架将传统的期望值函数分解方法推广至回报分布层面,实现了对回报分布的分解。为验证DFAC的有效性,我们首先在具有随机奖励的简单矩阵博弈中展示了其对价值函数的分解能力;随后,在星际争霸多智能体挑战赛(StarCraft Multi-Agent Challenge)的所有“超难”(Super Hard)地图以及六张自定义设计的“极难”(Ultra Hard)地图上进行了实验,结果表明,DFAC在多数情况下显著优于多个基线方法。

代码仓库

j3soon/dfac-extended
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
smac-on-smac-26m-vs-30mVDN
Average Score: 16.69
Median Win Rate: 23.01
smac-on-smac-26m-vs-30mQMIX
Average Score: 18.23
Median Win Rate: 62.78
smac-on-smac-26m-vs-30mQPLEX
Average Score: 18.66
Median Win Rate: 78.12
smac-on-smac-26m-vs-30mDMIX
Average Score: 19.17
Median Win Rate: 81.82
smac-on-smac-26m-vs-30mDDN
Average Score: 18.49
Median Win Rate: 67.90
smac-on-smac-26m-vs-30mDPLEX
Average Score: 18.49
Median Win Rate: 59.38
smac-on-smac-27m-vs-30mQPLEX
Average Score: 19.33
Median Win Rate: 78.12
smac-on-smac-27m-vs-30mDPLEX
Average Score: 19.62
Median Win Rate: 90.62
smac-on-smac-3s5z-vs-3s6z-1DPLEX
Average Score: 20.27
Median Win Rate: 90.62
smac-on-smac-3s5z-vs-3s6z-1QPLEX
Average Score: 20.42
Median Win Rate: 84.38
smac-on-smac-3s5z-vs-4s6zQMIX
Average Score: 13.09
smac-on-smac-3s5z-vs-4s6zDPLEX
Average Score: 14.99
smac-on-smac-3s5z-vs-4s6zQPLEX
Average Score: 13.60
smac-on-smac-3s5z-vs-4s6zDDN
Average Score: 19.65
Median Win Rate: 89.77
smac-on-smac-3s5z-vs-4s6zDMIX
Average Score: 18.61
Median Win Rate: 83.52
smac-on-smac-3s5z-vs-4s6zVDN
Average Score: 17.16
Median Win Rate: 47.16
smac-on-smac-6h-vs-8z-1QPLEX
Average Score: 15.95
smac-on-smac-6h-vs-8z-1DPLEX
Average Score: 17.88
Median Win Rate: 43.75
smac-on-smac-6h-vs-9zDPLEX
Average Score: 14.84
smac-on-smac-6h-vs-9zDMIX
Average Score: 13.73
smac-on-smac-6h-vs-9zVDN
Average Score: 13.57
smac-on-smac-6h-vs-9zQPLEX
Average Score: 13.86
smac-on-smac-6h-vs-9zDDN
Average Score: 16.00
Median Win Rate: 0.28
smac-on-smac-6h-vs-9zQMIX
Average Score: 12.37
Median Win Rate: 1.14
smac-on-smac-corridorDPLEX
Average Score: 19.08
Median Win Rate: 81.25
smac-on-smac-corridorQPLEX
Average Score: 18.73
Median Win Rate: 75.00
smac-on-smac-corridor-2z-vs-24zgDPLEX
Average Score: 10.71
Median Win Rate: 3.12
smac-on-smac-corridor-2z-vs-24zgVDN
Average Score: 7.78
Median Win Rate: 0.00
smac-on-smac-corridor-2z-vs-24zgQPLEX
Average Score: 6.44
smac-on-smac-corridor-2z-vs-24zgDDN
Average Score: 11.10
Median Win Rate: 41.19
smac-on-smac-corridor-2z-vs-24zgDMIX
Average Score: 7.41
smac-on-smac-corridor-2z-vs-24zgQMIX
Average Score: 4.80
smac-on-smac-mmm2-1DPLEX
Average Score: 19.93
Median Win Rate: 96.88
smac-on-smac-mmm2-1QPLEX
Average Score: 19.60
Median Win Rate: 96.88
smac-on-smac-mmm2-7m2m1m-vs-8m4m1mQPLEX
Average Score: 15.52
Median Win Rate: 46.88
smac-on-smac-mmm2-7m2m1m-vs-8m4m1mDPLEX
Average Score: 15.89
Median Win Rate: 50.00
smac-on-smac-mmm2-7m2m1m-vs-8m4m1mDDN
Average Score: 16.50
Median Win Rate: 56.82
smac-on-smac-mmm2-7m2m1m-vs-8m4m1mQMIX
Average Score: 14.40
Median Win Rate: 29.55
smac-on-smac-mmm2-7m2m1m-vs-8m4m1mDMIX
Average Score: 16.24
Median Win Rate: 63.35
smac-on-smac-mmm2-7m2m1m-vs-8m4m1mVDN
Average Score: 13.13
Median Win Rate: 13.35
smac-on-smac-mmm2-7m2m1m-vs-9m3m1mQPLEX
Average Score: 19.06
Median Win Rate: 90.62
smac-on-smac-mmm2-7m2m1m-vs-9m3m1mQMIX
Average Score: 19.01
Median Win Rate: 88.64
smac-on-smac-mmm2-7m2m1m-vs-9m3m1mDDN
Average Score: 19.45
Median Win Rate: 90.34
smac-on-smac-mmm2-7m2m1m-vs-9m3m1mDPLEX
Average Score: 19.40
Median Win Rate: 90.62
smac-on-smac-mmm2-7m2m1m-vs-9m3m1mVDN
Average Score: 17.30
Median Win Rate: 75.00
smac-on-smac-mmm2-7m2m1m-vs-9m3m1mDMIX
Average Score: 19.33
Median Win Rate: 92.33

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
用于多智能体强化学习中分布值函数分解的统一框架 | 论文 | HyperAI超神经