基于Boltzamnn机的机器人自主学习算法被引量：1

Self-learning Algorithm for Robot Based on Boltzamnn Machine

下载PDF

导出

摘要针对两轮机器人自平衡运动控制问题,提出了一种基于Boltzamnn机的Skinner操作条件反射学习机制作为机器人仿生自主学习的算法.该算法利用Boltzamnn机中Metropolis判据平衡Skinner操作条件反射学习中探索和利用的比例,并依据概率取向机制以一定的概率选择最优行为,从而使机器人在未知环境下可获得像人或动物一样的仿生自主学习技能,实现机器人的自平衡运动控制.最后,分别用基于Boltzamnn机的Skinner操作条件反射的学习算法和基于贪婪策略的Skinner操作条件反射的学习算法做了仿真实验并进行了比较.结果表明,基于Boltzamnn机的Skinner操作条件反射的学习算法能使机器人获得较强的运动平衡控制技能和较好的动态性能,体现了机器人的自主学习特性. In view of the self-balancing movement control problem of the two-wheeled robot,a bionic self-learning algorithm of the robot is proposed as a study mechanism of Skinner＇s operant conditioning reflection based on the Boltzamnn machine.This algorithm uses the Metropolis criterion in Boltzamnn machine to balance in the proportion of the exploration and the exploitation in the study of Skinner＇s operant conditioning reflection,and chooses the most superior behavior through certain probability depending on the probability tropism mechanism.Thus the robot can obtain the skill of bionic self-learning like the human or the animal under the unknown environment,and realize the self-balancing movement control of the robot.Finally,the simulation experiments were conducted and the Skinner＇s operant conditioning reflection study algorithms based on the Boltzamnn machine and the greedy strategy were compared,separately.Results show that the Skinner＇s operant conditioning reflection study algorithm based on the Boltzamnn machine can obtain the stronger movement balancing control skill and the better dynamic performance,and manifest the self-learning characteristics of the robot.

作者任红格阮晓钢

机构地区北京工业大学电子信息与控制工程学院

出处《北京工业大学学报》 EI CAS CSCD 北大核心 2012年第1期60-64,共5页 Journal of Beijing University of Technology

基金国家'八六三'计划资助项目(2007AA04Z226) 国家自然科学基金资助项目(60774077) 北京市教委重点资助项目(KZ200810005002)

关键词 Boltzamnn机 Skinner操作条件反射贪婪策略自主学习两轮机器人 Boltzamnn machine Skinner＇s operant conditioning greedy strategy self-learning two-wheeled robot

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1RAPHAEL B. The robot 'Shakey' and 'his' successors [J]. Computers and People, 1976, 25: 7-21. 被引量：1
2BROOKS R A. From earwigs to humans[J]. Robotics and Autonomous Systems, 1997, 20 : 291-304. 被引量：1
3WOLF R, HEISENBERG M. Basic organization of operant behavior as revealed in drosophila flight orientation [ J ]. Comp Physiol, 1991, 169: 699-705. 被引量：1
4TOURETZKY D S, SASKIDA L M. Operant conditioning in Skinnerbots [ J ]. Adaptive Behavior, 1997, 5 ( 3/4 ) : 219-47. 被引量：1
5ZALAMA E, GOMEZ J, PAUL M, et al. Adaptive behavior navigation of a mobile robot [ J ]. IEEE Transactions on Systems, Man, and Cybernetics-part A: Systems and Humans, 2002, 32( 1 ) : 160-169. 被引量：1
6DOMINGUEZ S, ZALAMA E. Robot learning in a social robot [ J ]. Lecture Notes in Comuter Science, 2006, 4095 : 691-702. 被引量：1
7HINTON G E, SEJNOWSKI T J, ACKLEY D H. Boltzmann machines: constraint satisfaction networks that learn [ R ] // Mellon University Technical Report. Pitsburgh: CMU, 1984: 1-37. 被引量：1
8HINTON G E, SEJNOWSKI T J. Learning and relearning in Boltzmann machines parallel distributed pressing [ M ]. Cambridge: MIT Press, 1986: 282-317. 被引量：1
9GUO Mao-zu, LIU Yang, JACEK M. A new Q-learning algorithm based on the metropolis criterion [ J ]. IEEE Transactions on Systems, Man, and Cybernetics-part B: Cybernetics, 2004, 34 (5) : 2140-2143. 被引量：1
10DAHMANI Y, BENYETTOU A. Seek of an optimal way by Q-learning[ J]. Journal of Computer Science, 2005, 1 (1) : 28-30. 被引量：1

二级参考文献19

1杨兴明,丁学明,张培仁,赵鹏.两轮移动式倒立摆的运动控制[J].合肥工业大学学报（自然科学版）,2005,28(11):1485-1488. 被引量：16
2HA Y S, YUTA S. Trajectory tracking control for navigation of the inverse pendulum type self-contained mobile robot [ J ]. Robotics and Autonomous Systems,1996,17(1-2) :65-80. 被引量：1
3GRASSER F, D' ARRIGO A, COLOMBI S, et al. Joe:a mobile, inverted pendulum[ J]. IEEE Trans on Industrial Electronics,2002, 49(1) :107-114. 被引量：1
4SALERNO A, ANGELES J. On the nonlinear controllability of a quasiholonomic mobile robot [ C ]//Proc of IEEE International Conference on Robotics and Automation. 2003:3379-3384. 被引量：1
5Barto A G, Sutton S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems[J].IEEE Trans on Systems Man and Cybernetics, 1983, 13(5) :834-846. 被引量：1
6Anderson C W. Learning to control an inverted pendulum using neural networks [ J]. IEEE Control System Magazine,1989, 9(4): 31-35. 被引量：1
7White D A, Sofge D A. Handbook of intelligent control:neural, fuzzy, and adaptive approaches[M]. New York: Van Nostrand Reinhold, 1992. 被引量：1
8Si J N, Wang Y T. On-line learning control by association and reinforcement [ J ]. IEEE Transactions on Neural Networks, 2001, 12(2): 264-276. 被引量：1
9Lin C T, Lee C S G. Reinforcement structure/parameter learning for neural network-based fuzzy logic control systems[J]. IEEE Transaction on Fuzzy Systems, 1994, 2( 1 ): 46-63. 被引量：1
10Berenji H R, Khedkar P. Leaming and tuning fuzzy logic controllers through reinforcements [ J]. IEEE Transactions on Neural Networks, 1992, 3(5) :724-740. 被引量：1

共引文献320

1项宇,秦进,袁琳琳.结合向前状态预测和隐空间约束的强化学习表示算法[J].计算机系统应用,2022,31(11):148-156. 被引量：4
2安萌萌,樊秀梅,蔡含宇.基于雾计算和强化学习的交通灯智能协同控制研究[J].计算机应用研究,2020,37(2):465-469. 被引量：8
3丁志梁,潘毅群(指导),谢建彤,王尉同,黄治钟.强化学习算法在空调系统运行优化中的应用研究[J].建筑节能,2020(7):14-20. 被引量：7
4王彦朋,郭佳佳,王晓君.基于Q-Learning的青霉素发酵过程控制方法[J].信息化研究,2023,49(3):31-35.
5马庆刘,喻鹏,吴佳慧,熊翱,颜拥.基于深度强化学习的综合能源业务通道优化机制[J].北京邮电大学学报,2020,43(2):87-93. 被引量：1
6赵元,张合新.基于目标状态距离简化Q-learning算法的迷宫路径规划[J].火箭军工程大学学报,2019(4):79-84.
7吕红涛,王国胜,吕强,刘峰.基于TMS320F28069的自平衡机器人控制系统设计[J].单片机与嵌入式系统应用,2012,12(9):42-45. 被引量：4
8周济,陈锋.基于强化神经网络的区域协调控制研究[J].电子技术（上海）,2010(9):20-22.
9卓睿,陈宗海,陈春林.基于强化学习和模糊逻辑的移动机器人导航[J].计算机仿真,2005,22(8):157-162. 被引量：5
10魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量：19

同被引文献5

1闵颖颖,刘允刚.Barbalat引理及其在系统稳定性分析中的应用[J].山东大学学报（工学版）,2007,37(1):51-55. 被引量：104
2裴九芳,王海,许德章.基于迭代学习控制的移动机器人轨迹跟踪控制[J].计算机工程与应用,2012,48(9):222-225. 被引量：14
3曹玉丽,史仪凯,袁小庆,刘江.自平衡机器人变论域模糊PID控制研究[J].计算机仿真,2013,30(2):347-350. 被引量：29
4陈梅,陈艳燕.两轮智能车跟踪控制系统的研究[J].合肥工业大学学报（自然科学版）,2015,38(3):319-324. 被引量：5
5刘起兴,吴钦木.四轮移动机器人控制系统的设计[J].东北师大学报（自然科学版）,2016,48(3):74-78. 被引量：2

引证文献1

1王猛,靳伍银,王安.轮式机器人轨迹跟踪控制系统的设计[J].计算机测量与控制,2017,25(10):102-104. 被引量：10

二级引证文献10

1王家蓬.全局渐近稳定的移动机器人轨迹跟踪[J].科技风,2018,0(34):94-95.
2刘振纲.全局渐近稳定的移动机器人轨迹跟踪[J].科技资讯,2019,17(1):93-93.
3李辉.基于同步协调控制的机器人倾斜误差轨迹纠正系统设计[J].现代电子技术,2019,42(18):160-163.
4滕昊,庄园,邓世建.基于Lyapunov稳定性的爬壁机器人路径跟踪双环滑模控制[J].科学技术与工程,2019,19(25):244-249. 被引量：3
5谢小英,曹伟.一种羽毛球陪练机器人的结构设计[J].机械设计与制造工程,2019,48(11):43-47. 被引量：5
6储开斌,郭俊俊.智能车运动轨迹跟踪算法的研究[J].电子测量与仪器学报,2020,32(6):131-137. 被引量：9
7滕昊,庄园,邓世建.基于全局稳定的爬壁机器人双环轨迹跟踪控制[J].计算机工程与设计,2020,41(9):2636-2642. 被引量：4
8王鹏,聂建军,解晓琳,鄢鸿桢.纯滚动轮式移动机器人设计及运动控制研究[J].机械传动,2022,46(8):85-92. 被引量：4
9赵志强,张文志,余涵,王勇,邓洁.移动机器人在球面上的轨迹跟踪控制[J].制造业自动化,2024,46(4):38-41.
10马梓竣,张欣刚,张树翠,姚文莉.轮式移动机器人的反演二次滑模优化轨迹跟踪控制策略研究[J].北京大学学报（自然科学版）,2024,60(4):597-606.

1任红格,阮晓钢.基于Skinner操作条件反射的两轮机器人自平衡控制[J].控制理论与应用,2010,27(10):1423-1428. 被引量：3
2任红格,阮晓钢.Skinner操作条件反射的一种仿生学习算法与机器人控制[J].机器人,2010,32(1):132-137. 被引量：3
3史涛,任红格.一种基于优势更新的机器人平衡控制算法[J].山东科技大学学报（自然科学版）,2013,32(3):17-21.
4邹智慧,杨帅.自平衡运动控制系统数学模型建立与分析[J].通讯世界,2016,22(12):299-299. 被引量：1
5陈楚明,戴晓勉,张永和,郑金.模拟退火算法在神经网络中的实现[J].现代计算机,2009,15(8):34-36. 被引量：1
6刘洪普,侯向丹.模拟退火算法中关键参数的研究[J].计算机工程与科学,2008,30(10):55-57. 被引量：18
7吕佳,邓乃扬,田英杰,邵元海,杨新民.局部学习半监督多类分类机[J].系统工程理论与实践,2013,33(3):748-754. 被引量：1
8裴继红 ,杨烜 .具有渐进局部学习特性的多色Voronoi分类器设计[J].电子与信息学报,2004,26(10):1613-1619.
9郑晋平.双轮车自平衡运动控制系统[J].山西电子技术,2016(6):26-28.
10蔡建羡,阮晓钢.基于遗传算法的Skinner操作条件反射学习模型[J].系统工程与电子技术,2011,33(6):1370-1376. 被引量：3

北京工业大学学报

2012年第1期

浏览历史

内容加载中请稍等...

基于Boltzamnn机的机器人自主学习算法被引量：1

参考文献16

二级参考文献19

共引文献320

同被引文献5

引证文献1

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

基于Boltzamnn机的机器人自主学习算法 被引量：1

参考文献16

二级参考文献19

共引文献320

同被引文献5

引证文献1

二级引证文献10

相关作者

相关机构

相关主题

浏览历史

基于Boltzamnn机的机器人自主学习算法被引量：1