Q学习演化博弈中决策机制对网络合作水平的影响

The Influence of Decision Mechanisms on Network Cooperation Level in Q-learning Evolutionary Game

下载PDF

导出

摘要针对博弈决策过程中个体无法获取邻居收益的问题,基于Q学习自我经验学习的特性,提出Q学习演化博弈模型。考虑到不同Q学习决策机制会对网络合作水平产生不同的影响,采用ε-greedy决策机制、Boltzmann决策机制和Max-plus决策机制,针对不同的网络类型、不同的博弈模型参数和不同的强化学习参数进行对比实验,量化分析决策机制对网络合作水平的影响。实验结果表明:与传统的演化博弈模型相比,Q学习演化博弈模型能够普遍提高网络的合作水平,并且不同的Q学习决策机制会对网络合作水平产生不同的影响,使用ε-greedy决策机制的模型合作水平比另两种模型高约35%和37%;较低的学习率、较高的折扣率以及适中的收益均匀性能够促进网络中个体间的合作,使用ε-greedy决策机制的模型合作水平比在较高学习率和较低折扣率下的合作水平分别高约40%和45%;在较高的探索率下,引入考虑个体全局属性的Max-plus决策机制的网络平均收益比引入另两种决策机制的Q学习模型高约22%和17%。 Aiming at addressing the problem that individuals face an inability to obtain benefits from their neighbors in the process of game decision making,this study examines the characteristics of self-experiential learning of Q-learning,thereby proposing a Q-learning evolutionary game model.Considering that different Q-learning decision mechanisms have different effects on the cooperation level of the network,the influence of the decision mechanism on the network cooperation level is quantitatively analyzed using three Q-learning decision mechanisms:ε-greedy,Boltzmann,and Max-plus by conducting comparative experiments on different network types,game model parameters,and reinforcement learning parameters.Experiments show that compared with the traditional evolutionary game models,the Q-learning evolutionary game model can generally improve the cooperation level of the network,with different Q-learning decision mechanisms having different effects on the cooperation level of the network.The cooperation level of the model using theε-greedy decision mechanism is approximately 35%and 37%higher than that of the models using the Boltzmann and Max-plus decision mechanisms,respectively.Lower learning rates,higher discount rates,and moderate benefit uniformity promote cooperation between individuals in the network,such that for theε-greedy decision mechanism,the cooperation level of the model using lower learning and higher discount rates is about 40%and 45%higher than that of the models using higher learning and lower discount rates,respectively.At the higher exploration level,introducing the Max-plus decision mechanism to consider global attributes of individuals improves the cooperation level by about 22%and 17%compared to using theε-greedy and Boltzmann decision mechanisms,respectively.

作者张尊栋王岩楠周慧娟张艺帆 ZHANG Zundong;WANG Yannan;ZHOU Huijuan;ZHANG Yifan(Beijing Key Laboratory of Urban Intelligent Traffic Control Technology,North China University of Technology,Beijing 100144,China;Intelligent Urban Transportation Systems Laboratory,University of Washington,Seattle 98195,USA;State Key Laboratory of Rail Traffic Control and Safety,Beijing Jiaotong University,Beijing 100044,China)

机构地区北方工业大学城市道路交通智能控制技术北京市重点实验室华盛顿大学智能城市交通系统实验室北京交通大学轨道交通控制与安全国家重点实验室

出处《计算机工程》 CAS CSCD 北大核心 2023年第6期99-106,114,共9页 Computer Engineering

基金 “十三五”国家重点研发计划(2018YFB1601000)。

关键词 Q学习决策机制网络演化博弈合作水平折扣率 Q-learning decision mechanism network evolutionary game cooperation level discount rate

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]