Time-in-action RL

导出

摘要 The authors propose a novel reinforcement learning(RL)framework,where agent behaviour is governed by traditional control theory.This integrated approach,called time-in-action RL,enables RL to be applicable to many real-world systems,where underlying dynamics are known in their control theoretical formalism.The key insight to facilitate this integration is to model the explicit time function,mapping the state-action pair to the time accomplishing the action by its underlying controller.In their framework,they describe an action by its value(action value),and the time that it takes to perform(action time).An action-value results from the policy of RL regarding a state.Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller.RL value network is then trained with embedded time model to predict action time.This approach is tested using a variant of Atari Pong and proved to be convergent.

作者 Jiangcheng Zhu Zhepei Wang Douglas Mcilwraith Chao Wu Chao Xu Yike Guo

机构地区 Institute of Cyber-Systems and Control Data Science Institute School of Public Affairs

出处《IET Cyber-Systems and Robotics》 EI 2019年第1期28-37,共10页 智能系统与机器人（英文）

关键词 action POLICY AGENT

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Peng YANG,Qi YANG,Ke TANG,Xin YAO.Parallel exploration via negatively correlated search[J].Frontiers of Computer Science,2021,15(5):123-135. 被引量：3
2杨彤,秦进,谢仲涛,袁琳琳.基于遗传交叉算子的深度Q网络样本扩充[J].计算机系统应用,2021,30(12):155-162. 被引量：1
3敖天宇,刘全.一种快速收敛的最大置信上界探索方法[J].计算机科学,2022,49(1):298-305.

IET Cyber-Systems and Robotics

2019年第1期

浏览历史

内容加载中请稍等...

Time-in-action RL

相关作者

相关机构

相关主题

浏览历史