Q-Learning算法的改进和实现

Improvement and Implementation of Q-Learning Algorithm

下载PDF

导出

摘要机器学习领域开始越来越受人们关注并且也是人工智能最新的探寻方向。最近几年强化学习的研究增长部分原因是在玩一些电子游戏中可以达到人类所达不到的高水平。使用基于策略的强化学习算法可以更好地适应游戏环境,探索出一种相对稳定的路径,达到全局最优的目标。本文研究的是基于强化学习Q-learning算法的Play Flappy Bird游戏。首先研究了强化学习的理论知识,对马尔可夫决策、动态规划、值函数近似、时间差分等相关理论进行了深入研究。重点研究了建立Flappy Bird游戏中的状态、行为、奖励数学模型,为了得到最优策略,对每一个状态下的目标是使总奖励最大化。在此基础上,本文将对深度卷积神经网络模型展开训练,从而可以识别游戏状态中的图像,并对其进行分类。系统仿真成功地运用深度Q-learning模型实现Flappy Bird的自我学习,探索概率ε在550,000更新中从0.6线性下降到0,学习率一开始非常陡峭,但随后达到稳定,在比较短的时间内实现收敛效果,训练误差较低。智能体训练达到理想效果,均值得分为86分,最高得分为335分,已经超过普通人类玩家,取得了良好的成绩。 The field of machine learning has begun to attract more and more attention and is also the latest direction of artificial intelligence. Part of the reason for the growth of research in reinforcement learning in recent years is that playing some video games can reach high levels that humans cannot reach. Using strategy-based reinforcement learning algorithms can better adapt to the game envi-ronment, explore a relatively stable path, and achieve the goal of global optimization. This article studies the Play Flappy Bird game based on the Q-learning algorithm of reinforcement learning. First, the theoretical knowledge of reinforcement learning is studied, and related theories such as Markov decision-making, dynamic programming, value function approximation, time difference and other related theories are deeply studied. The focus is on the establishment of mathematical models of states, behaviors, and rewards in Flappy Bird games. In order to obtain the optimal strategy, the goal for each state is to maximize the total reward. On this basis, this article will train the deep convolutional neural network model so that images in the game state can be identified and classi-fied. The system simulation successfully implements the self-learning of Flappy Bird using the deep Q-learning model. The exploration probability ε decreases linearly from 0.6 to 0 in the 550,000 updates. The learning rate is very steep at the beginning, but then it reaches a stable level, the convergence effect is achieved in a relatively short time, and the training error is low. The intelligent body training achieves the ideal effect. The average score is 86 points, and the highest score is 335 points, which has surpassed ordinary human players and achieved good results.

作者古彭李怀诚杨诗妍杨威刘嘉帆

机构地区西藏大学

出处《计算机科学与应用》 2021年第7期1994-2007,共14页 Computer Science and Application

关键词强化学习 Play Flappy Bird游戏 Q-Learning算法深度卷积神经网络

分类号 G63 [文化科学—教育学]

引文网络
相关文献

参考文献2

1郭树旭,马树志,李晶,张惠茅,孙长建,金兰依,刘晓鸣,刘奇楠,李雪妍.基于全卷积神经网络的肝脏CT影像分割研究[J].计算机工程与应用,2017,53(18):126-131. 被引量：24
2晋帅,李煊鹏,何嘉颖,李纾昶,周敬淞.基于强化学习的两轮模型车控制仿真分析[J].测控技术,2019,38(12):115-121. 被引量：3

二级参考文献7

1罗蔓,黄靖,杨丰.基于多模态3D-CNNs特征提取的MRI脑肿瘤分割方法[J].科学技术与工程,2014,22(31):78-83. 被引量：13
2刘鑫,陈永健,万洪林,孙娜娜.基于两阶段区域生长的肝内血管分割算法[J].计算机工程与应用,2015,51(12):194-197. 被引量：12
3廖苗,赵于前,曾业战,黄忠朝,邹北骥.基于图割和边缘行进的肝脏CT序列图像分割[J].电子与信息学报,2016,38(6):1552-1556. 被引量：7
4刘卫朋,邢关生,陈海永,孙鹤旭.基于增强学习的机械臂轨迹跟踪控制[J].计算机集成制造系统,2018,24(8):1996-2004. 被引量：20
5刘志荣,姜树海,袁雯雯,史晨辉.基于深度Q学习的移动机器人路径规划[J].测控技术,2019,38(7):24-28. 被引量：23
6Suhuai Luo,Xuechen Li,Jiaming Li.Review on the Methods of Automatic Liver Segmentation from Abdominal Images[J].Journal of Computer and Communications,2014,2(2):1-7. 被引量：5
7Wen Li,Fucang Jia,Qingmao Hu.Automatic Segmentation of Liver Tumor in CT Images with Deep Convolutional Neural Networks[J].Journal of Computer and Communications,2015,3(11):146-151. 被引量：17

共引文献25

1帖军,朱祖桐,郑禄,徐胜舟,马佳婷.基于混合空洞卷积与特征融合的肝脏肿瘤图像分割[J].电子测量技术,2023,46(22):122-130. 被引量：1
2吴巍,郭飞,郭毓,郭健.一种基于全卷积神经网络的横担姿态测量方法[J].华中科技大学学报（自然科学版）,2018,46(12):106-111. 被引量：4
3刘忠利,陈光,单志勇,蒋学芹.基于深度学习的脊柱CT图像分割[J].计算机应用与软件,2018,35(10):200-204. 被引量：12
4肖海慧,廖定安.基于人工神经网络的肝部CT图像识别方法[J].信息技术与信息化,2018(11):28-31.
5周正东,李剑波,辛润超,涂佳丽,贾俊山,魏士松.基于带孔U-net神经网络的肺癌危及器官并行分割方法[J].东南大学学报（自然科学版）,2019,49(2):231-236. 被引量：12
6黄潇,谷硕,马晓晔,梁文君,张韵,高连娣,魏锐利.人工智能糖网眼底图像识别在真实世界的应用[J].情报工程,2018,4(1):24-30. 被引量：18
7李伟斌,马洪林,易贤,赵凡,李维浩.基于色彩空间变换的彩色图像分割方法[J].计算机工程与应用,2019,55(9):162-167. 被引量：7
8黄绍辉,严凯,王博亮,王弘轩,王继伟.2D/3D级联卷积在分割CT肺动脉上的应用研究[J].中国数字医学,2019,14(5):7-11. 被引量：2
9张杰妹,杨词慧.基于RV-FCN的CT肝脏影像自动分割算法[J].计算机工程,2019,45(7):258-263. 被引量：7
10陈岩,张承明,高帅,于西占,于晓妍.一种适合警用的精细图像语义分割方法[J].警学研究,2019,0(3):18-24.

1司彦娜,普杰信,臧绍飞.基于残差梯度法的神经网络Q学习算法[J].计算机工程与应用,2020,56(18):137-142. 被引量：3
2陈学红.小学语文教学中学生朗读能力的培养策略研究[J].试题与研究,2020(31):63-63. 被引量：4
3蔡胜男.借力“数学实验”,提升学生数学学习质量[J].新教育（海南）,2021(8):48-48.
4李梅.浅谈核心素养下小学生语文学习习惯的养成策略[J].南北桥,2021(10):182-182.
5杨巧兰,肖红芳.英语专业学生词汇搭配学习策略研究——基于策略分类法的词汇学习调查[J].海外英语,2021(14):121-122.
6狄亚萍.建筑工程造价超预算的原因与控制措施[J].技术与市场,2021,28(8):192-193. 被引量：7
7朱东会.基于R分析我国进出口贸易对经济增长的影响[J].金融,2021,11(4):211-222.
8方程煜.与学校文化的“游戏”:“反学校文化”的情境化阐释——基于K中学的个案研究[J].基础教育,2021,18(3):39-52. 被引量：2
9陈岩岩.基于移动学习的数字教学资源教学应用策略[J].湖北开放职业学院学报,2021,34(14):138-139. 被引量：6
10韩宏斌,王晓静.疫情下初中美术“欣赏·评述”学习领域线上课程研究[J].亚太教育,2021(14):6-7.

计算机科学与应用

2021年第7期

浏览历史

内容加载中请稍等...

Q-Learning算法的改进和实现

参考文献2

二级参考文献7

共引文献25

相关作者

相关机构

相关主题

浏览历史