摘要
深度强化学习结合了深度学习的特征提取能力和强化学习的决策能力,近年来在众多领域得到了广泛应用,但现有的针对深度强化学习的研究通常假定系统状态完全可观测,而在实际应用中,由于受到感知能力的限制,智能体往往不能完全确定所处状态,即所处环境为局部可观测环境.同时,现有的无模型强化学习算法往往仅依赖以往历史数据来确定决策策略,不能利用可辅助智能体决策的未来有关信息.以局部可观测问题为应用背景,通过利用对比预测编码(Contrastive Prediction Code,CPC)对未来信息的预测能力实现局部可观测环境下未来信息辅助的无模型决策学习,提出的算法既保留了无模型强化学习算法端对端的训练、性能优势,又能充分利用预测的信息来辅助智能体的决策.在不同的局部可观测环境任务上对提出的算法进行了验证和对比,实验结果验证了该算法的有效性.
By combining the abilities of feature extraction of deep learning and decision-making of reinforcement learning,deep reinforcement learning algorithms have been widely applied in various domains in recent years.While current algorithms mainly focus on planning in fully observable environments,in reality,the states of many applications can only be partially observed due to the limitation of the agents′perception,i.e.,the environments are partially observable.Furthermore,for model-free reinforcement learning algorithms,the decision usually relies on historical data,and no future information that may help the decision making is utilized.In this paper,aims to address the planning problem in partially observable domains,we propose a model-free reinforcement learning algorithm where future information can be incorporated as in the model-based reinforcement learning framework,and the future information is predicted by Contrastive Prediction Code(CPC).Our proposed algorithm can not only retain the end-to-end training and performance advantages of the model-free reinforcement learning algorithm,but also utilize future information for the decision of the agent.The proposed algorithm has been verified and compared on different locally observable environmental tasks.Experimental results demonstrate the effectiveness of the proposed algorithm.
作者
常芳芳
陈祺航
刘云龙
Chang Fangfang;Chen Qihang;Liu Yunlong(Department of Automation,Xiamen University,Xiamen,361102,China)
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2022年第5期796-804,共9页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(61772438,61375077)
关键词
深度强化学习
局部可观测环境
对比预测编码
未来信息
表征学习
deep reinforcement learning(DRL)
partially observable environment
contrastive prediction code(CPC)
future information
representation learning