3 个月前

DFAC框架:通过分位数混合对价值函数进行因子分解,实现多智能体分布强化学习Q学习

DFAC框架:通过分位数混合对价值函数进行因子分解,实现多智能体分布强化学习Q学习

摘要

在完全协作的多智能体强化学习(MARL)环境中,由于每个智能体的观测具有部分可观测性,且其他智能体的策略持续动态变化,环境呈现出高度的随机性。为应对上述挑战,本文提出一种分布式价值函数分解框架——分布式价值函数分解(Distributional Value Function Factorization, DFAC),将分布式强化学习与价值函数分解方法相结合,实现了对传统期望值函数分解方法的推广,构建其分布式变体。DFAC 将个体效用函数从确定性变量扩展为随机变量,并将总回报的分位数函数建模为分位数混合形式。为验证 DFAC 的有效性,本文首先展示了其在具有随机奖励的简单两步矩阵博弈中的分解能力,随后在 StarCraft 多智能体挑战赛(StarCraft Multi-Agent Challenge)的所有“超难”(Super Hard)任务上进行了实验。实验结果表明,DFAC 在性能上显著优于基于期望值函数分解的基线方法。

代码仓库

j3soon/dfac
官方
tf
GitHub 中提及

基准测试

基准方法指标
smac-on-smac-27m-vs-30mDMIX
Average Score: 19.43
Median Win Rate: 85.45
smac-on-smac-27m-vs-30mVDN
Average Score: 18.45
Median Win Rate: 63.12
smac-on-smac-27m-vs-30mDIQL
Average Score: 14.45
Median Win Rate: 6.02
smac-on-smac-27m-vs-30mQMIX
Average Score: 19.41
Median Win Rate: 84.77
smac-on-smac-27m-vs-30mDDN
Average Score: 19.71
Median Win Rate: 91.48
smac-on-smac-27m-vs-30mIQL
Average Score: 14.01
Median Win Rate: 2.27
smac-on-smac-3s5z-vs-3s6z-1DIQL
Average Score: 17.52
Median Win Rate: 62.22
smac-on-smac-3s5z-vs-3s6z-1QMIX
Average Score: 20.16
Median Win Rate: 67.22
smac-on-smac-3s5z-vs-3s6z-1DDN
Average Score: 20.94
Median Win Rate: 94.03
smac-on-smac-3s5z-vs-3s6z-1IQL
Average Score: 16.54
Median Win Rate: 29.83
smac-on-smac-3s5z-vs-3s6z-1VDN
Average Score: 19.75
Median Win Rate: 89.2
smac-on-smac-3s5z-vs-3s6z-1DMIX
Average Score: 19.7
Median Win Rate: 91.08
smac-on-smac-6h-vs-8z-1VDN
Average Score: 15.41
Median Win Rate: 0
smac-on-smac-6h-vs-8z-1DDN
Average Score: 19.4
Median Win Rate: 83.92
smac-on-smac-6h-vs-8z-1QMIX
Average Score: 14.37
Median Win Rate: 12.78
smac-on-smac-6h-vs-8z-1DMIX
Average Score: 17.14
Median Win Rate: 49.43
smac-on-smac-6h-vs-8z-1IQL
Average Score: 13.78
Median Win Rate: 0
smac-on-smac-6h-vs-8z-1DIQL
Average Score: 14.94
Median Win Rate: 0.00
smac-on-smac-corridorDIQL
Average Score: 19.68
Median Win Rate: 91.62
smac-on-smac-corridorVDN
Average Score: 19.47
Median Win Rate: 85.34
smac-on-smac-corridorDDN
Average Score: 20
Median Win Rate: 95.4
smac-on-smac-corridorQMIX
Average Score: 15.07
Median Win Rate: 37.61
smac-on-smac-corridorDMIX
Average Score: 19.66
Median Win Rate: 90.45
smac-on-smac-corridorIQL
Average Score: 19.42
Median Win Rate: 84.87
smac-on-smac-def-armored-parallelDMIX
Median Win Rate: 90.0
smac-on-smac-def-armored-parallelDDN
Median Win Rate: 0.0
smac-on-smac-def-armored-parallelDIQL
Median Win Rate: 0.0
smac-on-smac-def-armored-sequentialDDN
Median Win Rate: 71.9
smac-on-smac-def-armored-sequentialDIQL
Median Win Rate: 53.1
smac-on-smac-def-armored-sequentialDMIX
Median Win Rate: 81.3
smac-on-smac-def-infantry-parallelDMIX
Median Win Rate: 90.0
smac-on-smac-def-infantry-parallelDDN
Median Win Rate: 20.0
smac-on-smac-def-infantry-sequentialDIQL
Median Win Rate: 93.8
smac-on-smac-def-infantry-sequentialDDN
Median Win Rate: 90.6
smac-on-smac-def-infantry-sequentialDMIX
Median Win Rate: 100
smac-on-smac-def-outnumbered-parallelDIQL
Median Win Rate: 0.0
smac-on-smac-def-outnumbered-parallelDMIX
Median Win Rate: 5.0
smac-on-smac-def-outnumbered-parallelDDN
Median Win Rate: 0.0
smac-on-smac-def-outnumbered-sequentialDDN
Median Win Rate: 0.0
smac-on-smac-def-outnumbered-sequentialDMIX
Median Win Rate: 0.0
smac-on-smac-def-outnumbered-sequentialDIQL
Median Win Rate: 0.0
smac-on-smac-mmm2-1DIQL
Average Score: 19.21
Median Win Rate: 85.23
smac-on-smac-mmm2-1QMIX
Average Score: 19.42
Median Win Rate: 92.44
smac-on-smac-mmm2-1VDN
Average Score: 19.36
Median Win Rate: 89.2
smac-on-smac-mmm2-1IQL
Average Score: 17.5
Median Win Rate: 68.92
smac-on-smac-mmm2-1DDN
Average Score: 20.9
Median Win Rate: 97.22
smac-on-smac-mmm2-1DMIX
Average Score: 19.87
Median Win Rate: 95.11
smac-on-smac-off-complicated-parallelDMIX
Median Win Rate: 0.0
smac-on-smac-off-complicated-parallelDDN
Median Win Rate: 0.0
smac-on-smac-off-complicated-parallelDIQL
Median Win Rate: 0.0
smac-on-smac-off-distant-parallelDIQL
Median Win Rate: 0.0
smac-on-smac-off-distant-parallelDDN
Median Win Rate: 0.0
smac-on-smac-off-distant-parallelDMIX
Median Win Rate: 0.0
smac-on-smac-off-hard-parallelDDN
Median Win Rate: 0.0
smac-on-smac-off-hard-parallelDIQL
Median Win Rate: 0.0
smac-on-smac-off-hard-parallelDMIX
Median Win Rate: 0.0
smac-on-smac-off-near-parallelDIQL
Median Win Rate: 0.0
smac-on-smac-off-near-parallelDDN
Median Win Rate: 0.0
smac-on-smac-off-near-parallelDMIX
Median Win Rate: 0.0
smac-on-smac-off-superhard-parallelDDN
Median Win Rate: 0.0
smac-on-smac-off-superhard-parallelDIQL
Median Win Rate: 0.0
smac-on-smac-off-superhard-parallelDMIX
Median Win Rate: 0.0

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
DFAC框架:通过分位数混合对价值函数进行因子分解,实现多智能体分布强化学习Q学习 | 论文 | HyperAI超神经