摘要
综合考虑再励学习的两个重要子问题 :连续空间及语言评价问题 ,提出了一种新的学习方法 ,即面向语言评价的 Takagi-Sugeno(T-S)模糊再励学习。该学习智能体构建在 Q-学习方法和 Takagi-Sugeno模糊推理系统的基础上 ,适于处理连续域的复杂学习任务 ,亦可用于设计 Takagi-Sugeno模糊逻辑控制器。以二级倒立摆控制系统为例 。
This paper presents a learning method to simultaneously resolve two significant sub problems in reinforcement learning: continuous space and linguistic rewards. A linguistic reward oriented Takagi Sugeno fuzzy reinforcement learning (LRTSFRL) model was constructed by combining the Q learning method with Takagi Sugeno type fuzzy inference systems. The proposed method is capable of solving complicated learning tasks in continuous domains and can be used to design Takagi Sugeno fuzzy logic controllers. Experiments with the double inverted pendulum system demonstrated the improved performance of the scheme.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2002年第10期1393-1396,共4页
Journal of Tsinghua University(Science and Technology)
基金
国家"九七三"重点基础研究发展规划项目( G19990 32 70 7)