In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observ...In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy.展开更多
In decades,the battlefield environment is becoming more and more complex with plenty of electronic equipments.Thus,in order to improve the survivability of radar sensors and satisfy the requirement of maneuvering targ...In decades,the battlefield environment is becoming more and more complex with plenty of electronic equipments.Thus,in order to improve the survivability of radar sensors and satisfy the requirement of maneuvering target tracking with a low probability of intercept,a non-myopic scheduling is proposed to minimize the radiation cost with tracking accuracy constraint.At first,the scheduling problem is formulated as a partially observable Markov decision process(POMDP).Then the tracking accuracy and radiation cost over the future finite time horizon are predicted by the posterior carmer-rao lower bound(PCRLB) and the hidden Markov model filter,respectively.Finally,the proposed scheduling is implemented efficiently by utilizing the branch and bound(B&B) pruning algorithm.Simulation results show that the performance of maneuvering target tracking was improved by the improved interacting multiple model(IMM),and the scheduler time and maximum memory consumption were significant reduced by the present B&B pruning algorithm without losing the optimal solution.展开更多
不确定性和隐状态是目前强化学习所要面对的重要难题.本文提出了一种新的算法MA-Q-learning算法来求解带有这种不确定性的POMDP问题近似最优策略.利用M em etic算法来进化策略,而Q学习算法得到预测奖励来指出进化策略的适应度值.针对隐...不确定性和隐状态是目前强化学习所要面对的重要难题.本文提出了一种新的算法MA-Q-learning算法来求解带有这种不确定性的POMDP问题近似最优策略.利用M em etic算法来进化策略,而Q学习算法得到预测奖励来指出进化策略的适应度值.针对隐状态问题,通过记忆agent最近经历的确定性的有限步历史信息,与表示所有可能状态上的概率分布的信度状态相结合,共同决策当前的最优策略.利用一种混合搜索方法来提高搜索效率,其中调整因子被用于保持种群的多样性,并且指导组合式交叉操作与变异操作.在POMDP的Benchm ark实例上的实验结果证明本文提出的算法性能优于其他的POMDP近似算法.展开更多
基金supported by National Natural Science Foundation of China (Nos. 61174124, 61233003 and 60935001)National High Technology Research and Development Program of China (863 Program) (No. 2011AA01A102)
文摘In network service systems, satisfying quality of service (QoS) is one of the main objectives. Admission control and resource allocation strategy can be used to guarantee the QoS requirement. Based on partially observable Markov decision processes (POMDPs), this paper proposes a novel admission control model for video on demand (VOD) service systems with elastic QoS. Elastic QoS is also considered in resource allocation strategy. Policy gradient algorithm is often available to find the solution of POMDP problems, with a satisfactory convergence rate. Through numerical examples, it can be shown that the proposed admission control strategy has better performance than complete admission control strategy.
基金supported by the National Defense Pre-research Foundation of China(012015012600A2203)。
文摘In decades,the battlefield environment is becoming more and more complex with plenty of electronic equipments.Thus,in order to improve the survivability of radar sensors and satisfy the requirement of maneuvering target tracking with a low probability of intercept,a non-myopic scheduling is proposed to minimize the radiation cost with tracking accuracy constraint.At first,the scheduling problem is formulated as a partially observable Markov decision process(POMDP).Then the tracking accuracy and radiation cost over the future finite time horizon are predicted by the posterior carmer-rao lower bound(PCRLB) and the hidden Markov model filter,respectively.Finally,the proposed scheduling is implemented efficiently by utilizing the branch and bound(B&B) pruning algorithm.Simulation results show that the performance of maneuvering target tracking was improved by the improved interacting multiple model(IMM),and the scheduler time and maximum memory consumption were significant reduced by the present B&B pruning algorithm without losing the optimal solution.
文摘不确定性和隐状态是目前强化学习所要面对的重要难题.本文提出了一种新的算法MA-Q-learning算法来求解带有这种不确定性的POMDP问题近似最优策略.利用M em etic算法来进化策略,而Q学习算法得到预测奖励来指出进化策略的适应度值.针对隐状态问题,通过记忆agent最近经历的确定性的有限步历史信息,与表示所有可能状态上的概率分布的信度状态相结合,共同决策当前的最优策略.利用一种混合搜索方法来提高搜索效率,其中调整因子被用于保持种群的多样性,并且指导组合式交叉操作与变异操作.在POMDP的Benchm ark实例上的实验结果证明本文提出的算法性能优于其他的POMDP近似算法.
基金Supported by the National Natural Science Foundation of China under Grant No.60503021(国家自然科学基金)the High-Tech Research Program of Jiangsu Province of China under Grant No.BG2006027(江苏省高技术研究计划)
文摘基于点的算法是部分可观察马尔可夫决策过程(partially observable Markov decision processes,简称POMDP)的一类近似算法.它们只在一个信念点集上进行Backup操作,避免了线性规划并使用了更少的中间变量,从而将计算瓶颈由选择向量转向了生成向量.但这类算法在生成向量时含有大量重复和无意义计算,针对于此,提出了基于点的POMDP算法的预处理方法(preprocessing method for point-based algorithms,简称PPBA).该方法对每个样本信念点作预处理,并且在生成α-向量之前首先计算出该选取哪个动作和哪些α-向量,从而消除了重复计算.PPBA还提出了基向量的概念,利用问题的稀疏性避免了无意义计算.通过在Perseus上的实验,表明PPBA很大地提高了算法的执行速度.