期刊文献+

“拱猪”游戏的深度蒙特卡洛博弈算法 被引量:2

Deep Monte Carlo algorithm for Gongzhu game
下载PDF
导出
摘要 针对现有的“拱猪”卷积模型计算复杂且高度依赖专家知识的问题,提出一种应用于“拱猪”博弈游戏的深度神经网络和蒙特卡洛方法相结合的深度蒙特卡洛算法。采用自对弈的方式进行模拟和评估,使用深度Q网络代替Q表完成Q值的更新,高效地对“拱猪”策略进行探索和利用;采用分布式并行计算的方法提高训练效率,较于传统的蒙特卡洛方法可有效地解决高方差问题。在具有一个GPU的单台服务器上训练24 h后,所构建的智能代理与“拱猪”卷积模型对弈了10000局。实验结果表明:智能代理胜率可达78.3%,平均每局可获得67分,对具体示例进行分析,进一步验证了该算法的有效性以及智能代理的良好性能。 The existing convolutional neural network model for Gongzhu game is computationally complex and highly dependent on expert knowledge.In order to solve this problem,a deep Monte Carlo algorithm combining deep neural network and Monte Carlo method is proposed for Gongzhu.This algorithm uses the self-play method to simulate and evaluate actions and states,and uses a deep Q-network to replace Q-table to complete the updating of the Q-value,efficiently exploring and utilizing the strategy for Gongzhu.Besides,this algorithm also uses distributed parallel computing to improve training efficiency.Compared with the traditional Monte Carlo method,the proposed algorithm can effectively solve the problem of high variance.After training on a single server with one GPU for 24 hours,the constructed intelligent agent applied to the proposed algorithm plays 10000 games against Gongzhu convolutional neural network model.The experimental results show that the intelligent agent has a winning rate of 78.3%,with an average of 67 points per game.The analysis of specific examples further verifies the effectiveness of the algorithm and a good performance of the intelligent agent.
作者 吴立成 吴启飞 钟宏鸣 李霞丽 WU Licheng;WU Qifei;ZHONG Hongming;LI Xiali(School of Information Engineering,Minzu University of China,Beijing 100081,China)
出处 《重庆理工大学学报(自然科学)》 CAS 北大核心 2022年第12期121-128,共8页 Journal of Chongqing University of Technology:Natural Science
基金 国家自然科学基金项目(62276285)。
关键词 人工智能 拱猪 深度强化学习 蒙特卡洛方法 artificial intelligence Gongzhu deep reinforcement learning Monte Carlo method
  • 相关文献

参考文献5

二级参考文献11

共引文献20

同被引文献6

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部