提出了一种基于M u lti-A gen t的区域交通协调控制系统。系统针对路网中各交叉口交通流相互影响的特点,构造了一种基于分布权值函数的分布式Q学习算法,采用此算法实现了M u lti-A gen t的学习以及协调机制。通过各A gen t间的协调控制...提出了一种基于M u lti-A gen t的区域交通协调控制系统。系统针对路网中各交叉口交通流相互影响的特点,构造了一种基于分布权值函数的分布式Q学习算法,采用此算法实现了M u lti-A gen t的学习以及协调机制。通过各A gen t间的协调控制来协调相邻交叉口处的控制信号,从而消除路网中的交通拥塞。最后利用微观交通仿真软件Param ics对控制算法进行了仿真研究,仿真结果表明了控制算法的有效性。展开更多
针对海量机器类通信(massive machine type communication,mMTC)场景,以最大化系统吞吐量为目标,且在保证部分机器类通信设备(machine type communication device,MTCD)的服务质量(quality of service,QoS)要求前提下,提出两种基于Q学...针对海量机器类通信(massive machine type communication,mMTC)场景,以最大化系统吞吐量为目标,且在保证部分机器类通信设备(machine type communication device,MTCD)的服务质量(quality of service,QoS)要求前提下,提出两种基于Q学习的资源分配算法:集中式Q学习算法(team-Q)和分布式Q学习算法(dis-Q)。首先基于余弦相似度(cosine similarity,CS)聚类算法,考虑到MTCD地理位置和多级别QoS要求,构造代表MTCD和数据聚合器(data aggregator,DA)的多维向量,根据向量间CS值完成分组。然后分别利用team-Q学习算法和dis-Q学习算法为MTCD分配资源块(resource block,RB)和功率。吞吐量性能上,team-Q和dis-Q算法相较于动态资源分配算法、贪婪算法分别平均提高了16%、23%;复杂度性能上,dis-Q算法仅为team-Q算法的25%及以下,收敛速度则提高了近40%。展开更多
基于强化学习(Reinforcement Learning,RL),在保证用户服务质量(Quality of Service,QoS)的前提下,研究了人机物混合接入的异构网络中通信-计算资源联合分配算法。建立了一种新型人机物混合接入的异构网络拓扑结构。在最小服务质量需求...基于强化学习(Reinforcement Learning,RL),在保证用户服务质量(Quality of Service,QoS)的前提下,研究了人机物混合接入的异构网络中通信-计算资源联合分配算法。建立了一种新型人机物混合接入的异构网络拓扑结构。在最小服务质量需求、无人机(Unmanned Aerial Vehicle,UAV)传输功率等限制条件下,将信道分配、功率分配和计算资源联合分配问题建模为最小化系统时延和能耗的多目标优化问题。基于强化学习理论和多智能体马尔可夫决策过程,提出一种分布式Q学习通信-计算资源联合分配(Distributed Q-learning Communication and Computing joint Resources Allocation,DQ-CCRA)算法。该算法与现有算法相比,不仅能够降低人类型设备对物类型设备的干扰,还能有效减小系统时延和能耗,将系统总开销降低7.4%。展开更多
Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learnin...Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has distinct advantages over learning their expected value as seen in different RL tasks. The shift from using the expectation of returns in traditional RL to the distribution over returns in distributional RL has provided new insights into the dynamics of RL. This paper builds on our recent work investigating the quantum approach towards RL. Our work implements the quantile regression (QR) distributional Q learning with a quantum neural network. This quantum network is evaluated in a grid world environment with a different number of quantiles, illustrating its detailed influence on the learning of the algorithm. It is also compared to the standard quantum Q learning in a Markov Decision Process (MDP) chain, which demonstrates that the quantum QR distributional Q learning can explore the environment more efficiently than the standard quantum Q learning. Efficient exploration and balancing of exploitation and exploration are major challenges in RL. Previous work has shown that more informative actions can be taken with a distributional perspective. Our findings suggest another cause for its success: the enhanced performance of distributional RL can be partially attributed to its superior ability to efficiently explore the environment.展开更多
文摘提出了一种基于M u lti-A gen t的区域交通协调控制系统。系统针对路网中各交叉口交通流相互影响的特点,构造了一种基于分布权值函数的分布式Q学习算法,采用此算法实现了M u lti-A gen t的学习以及协调机制。通过各A gen t间的协调控制来协调相邻交叉口处的控制信号,从而消除路网中的交通拥塞。最后利用微观交通仿真软件Param ics对控制算法进行了仿真研究,仿真结果表明了控制算法的有效性。
文摘针对海量机器类通信(massive machine type communication,mMTC)场景,以最大化系统吞吐量为目标,且在保证部分机器类通信设备(machine type communication device,MTCD)的服务质量(quality of service,QoS)要求前提下,提出两种基于Q学习的资源分配算法:集中式Q学习算法(team-Q)和分布式Q学习算法(dis-Q)。首先基于余弦相似度(cosine similarity,CS)聚类算法,考虑到MTCD地理位置和多级别QoS要求,构造代表MTCD和数据聚合器(data aggregator,DA)的多维向量,根据向量间CS值完成分组。然后分别利用team-Q学习算法和dis-Q学习算法为MTCD分配资源块(resource block,RB)和功率。吞吐量性能上,team-Q和dis-Q算法相较于动态资源分配算法、贪婪算法分别平均提高了16%、23%;复杂度性能上,dis-Q算法仅为team-Q算法的25%及以下,收敛速度则提高了近40%。
文摘基于强化学习(Reinforcement Learning,RL),在保证用户服务质量(Quality of Service,QoS)的前提下,研究了人机物混合接入的异构网络中通信-计算资源联合分配算法。建立了一种新型人机物混合接入的异构网络拓扑结构。在最小服务质量需求、无人机(Unmanned Aerial Vehicle,UAV)传输功率等限制条件下,将信道分配、功率分配和计算资源联合分配问题建模为最小化系统时延和能耗的多目标优化问题。基于强化学习理论和多智能体马尔可夫决策过程,提出一种分布式Q学习通信-计算资源联合分配(Distributed Q-learning Communication and Computing joint Resources Allocation,DQ-CCRA)算法。该算法与现有算法相比,不仅能够降低人类型设备对物类型设备的干扰,还能有效减小系统时延和能耗,将系统总开销降低7.4%。
文摘Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has distinct advantages over learning their expected value as seen in different RL tasks. The shift from using the expectation of returns in traditional RL to the distribution over returns in distributional RL has provided new insights into the dynamics of RL. This paper builds on our recent work investigating the quantum approach towards RL. Our work implements the quantile regression (QR) distributional Q learning with a quantum neural network. This quantum network is evaluated in a grid world environment with a different number of quantiles, illustrating its detailed influence on the learning of the algorithm. It is also compared to the standard quantum Q learning in a Markov Decision Process (MDP) chain, which demonstrates that the quantum QR distributional Q learning can explore the environment more efficiently than the standard quantum Q learning. Efficient exploration and balancing of exploitation and exploration are major challenges in RL. Previous work has shown that more informative actions can be taken with a distributional perspective. Our findings suggest another cause for its success: the enhanced performance of distributional RL can be partially attributed to its superior ability to efficiently explore the environment.