期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Heterogeneous Network Selection Optimization Algorithm Based on a Markov Decision Model 被引量:7
1
作者 Jianli Xie Wenjuan Gao Cuiran Li 《China Communications》 SCIE CSCD 2020年第2期40-53,共14页
A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Consideri... A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs. 展开更多
关键词 heterogeneous wireless networks markov decision process reward function genetic algorithm simulated annealing
下载PDF
A New Theoretical Framework of Pyramid Markov Processes for Blockchain Selfish Mining 被引量:2
2
作者 Quanlin Li Yanxia Chang +1 位作者 Xiaole Wu Guoqing Zhang 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2021年第6期667-711,共45页
In this paper,we provide a new theoretical framework of pyramid Markov processes to solve some open and fundamental problems of blockchain selfish mining under a rigorous mathematical setting.We first describe a more ... In this paper,we provide a new theoretical framework of pyramid Markov processes to solve some open and fundamental problems of blockchain selfish mining under a rigorous mathematical setting.We first describe a more general model of blockchain selfish mining with both a two-block leading competitive criterion and a new economic incentive mechanism.Then we establish a pyramid Markov process and show that it is irreducible and positive recurrent,and its stationary probability vector is matrix-geometric with an explicitly representable rate matrix.Also,we use the stationary probability vector to study the influence of orphan blocks on the waste of computing resource.Next,we set up a pyramid Markov reward process to investigate the long-run average mining profits of the honest and dishonest mining pools,respectively.As a by-product,we build one-dimensional Markov reward processes and provide some new interesting interpretation on the Markov chain and the revenue analysis reported in the seminal work by Eyal and Sirer(2014).Note that the pyramid Markov(reward)processes can open up a new avenue in the study of blockchain selfish mining.Thus we hope that the methodology and results developed in this paper shed light on the blockchain selfish mining such that a series of promising research can be developed potentially. 展开更多
关键词 Blockchain Proof of Work selfish mining main chain pyramid markov process pyramid markov reward process phase-type distribution Matrix-geometric solution
原文传递
Convergence analysis of an incremental approach to online inverse reinforcement learning
3
作者 Zhuo-jun JIN Hui QIAN Shen-yi CHEN Miao-liang ZHU 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第1期17-24,共8页
Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and... Interest in inverse reinforcement learning (IRL) has recently increased,that is,interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert.This paper deals with an incremental approach to online IRL.First,the convergence property of the incremental method for the IRL problem was investigated,and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof.Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem.The key idea is to add an increment to the current reward estimate each time an action mismatch occurs.This leads to an estimate that approaches a target optimal value.The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function. 展开更多
关键词 Incremental approach reward recovering Online learning Inverse reinforcement learning markov decision process
原文传递
基于马尔科夫奖励过程的牵引系统可靠性评估 被引量:1
4
作者 李小波 褚敏 +2 位作者 陆朱剑 程岳梅 田世贺 《智能计算机与应用》 2020年第2期89-92,96,共5页
针对地铁列车牵引系统可靠性问题,提出一种层次分析法和马尔科夫奖励过程相结合的牵引系统可靠性评估方法。利用层次分析法确定牵引系统模块层的可靠性评价指标,并计算综合权重,确定各模块层进入不同状态的奖励系数,采用马尔科夫奖励过... 针对地铁列车牵引系统可靠性问题,提出一种层次分析法和马尔科夫奖励过程相结合的牵引系统可靠性评估方法。利用层次分析法确定牵引系统模块层的可靠性评价指标,并计算综合权重,确定各模块层进入不同状态的奖励系数,采用马尔科夫奖励过程考虑不同衰减系数下系统可靠性的变化,建立了牵引系统可靠性评估模型。该模型对地铁列车牵引系统的可靠性评估及制定维修维护策略具有重要的参考价值。 展开更多
关键词 地铁牵引系统 可靠性 层次分析法 马尔科夫奖励过程
下载PDF
带动作回报的连续时间Markov回报过程验证
5
作者 黄镇谨 陆阳 +1 位作者 杨娟 王智文 《电子测量与仪器学报》 CSCD 北大核心 2015年第11期1603-1613,共11页
为了能够更准确的表达不确定性复杂系统的时空验证,针对当前连续时间Markov回报过程(continue time markov reward decision process,CMRDP)验证中只考虑状态回报的问题,提出带动作回报的验证方法。考虑添加了动作回报的空间性能约束,... 为了能够更准确的表达不确定性复杂系统的时空验证,针对当前连续时间Markov回报过程(continue time markov reward decision process,CMRDP)验证中只考虑状态回报的问题,提出带动作回报的验证方法。考虑添加了动作回报的空间性能约束,扩展现有的基于状态回报的连续时间Markov回报过程,用正则表达式表示验证属性的路径规范,扩展已有路径算子的表达能力。给出带动作回报CMRDP和路径规范的积模型,求解积模型在确定性策略下的诱导Markov回报模型(markov reward model,MRM),将CMRDP上的时空性能验证转换为MRM模型上的时空可达概率分析,并提出MRM中求解可达概率的算法。实例分析表明,提出的验证思路和验证算法是可行的。 展开更多
关键词 markov回报过程 模型验证 动作回报 时空有界可达概率
下载PDF
非对称超市模型的报酬过程与性能优化研究
6
作者 李泉林 丁园园 杨飞飞 《应用概率统计》 CSCD 北大核心 2015年第4期411-431,共21页
超市模型具有操作简单、反应快速、实时管控等优点而成为研究大型网络资源管理的一个重要数学工具,它已经在物联网、云计算、云制造、大数据、交通运输、医疗卫生等重要实际领域中获得了极为广泛的应用.目前,非对称超市模型是这个研究... 超市模型具有操作简单、反应快速、实时管控等优点而成为研究大型网络资源管理的一个重要数学工具,它已经在物联网、云计算、云制造、大数据、交通运输、医疗卫生等重要实际领域中获得了极为广泛的应用.目前,非对称超市模型是这个研究方向上的一个重要课题.在本文中,我们研究了一个非对称超市模型.由于M个服务台不相同,所以到达顾客的路径选择策略表现得较为复杂:它不仅与队长和服务速度有关,而且也与服务台的信誉有关.为此,我们利用决策方法构造了非对称超市模型的路径选择策略.基于此,我们利用马氏报酬过程及其优化技术,建立了这个非对称超市模型的泛函报酬方程,并给出了这些泛函报酬方程的一个值递推算法;通过对这个报酬函数的一个相向优化,提供了这类非对称超市模型研究中的一个性能评价准则.为了理解非对称超市模型是如何通过客观条件与主观行为来实施对大型网络资源进行有效管控,本文的研究方法与结果在这个方向上首次提供了一些必要的理论依据. 展开更多
关键词 非对称超市模型 路径选择策略 马氏报酬过程 报酬函数 值递推算法
下载PDF
Asymptotic Evaluations of the Stability Index for a Markov Control Process with the Expected Total Discounted Reward Criterion
7
作者 Jaime Eduardo Martínez-Sánchez 《American Journal of Operations Research》 2021年第1期62-85,共24页
In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary poli... In this work, for a control consumption-investment process with the discounted reward optimization criteria, a numerical estimate of the stability index is made. Using explicit formulas for the optimal stationary policies and for the value functions, the stability index is explicitly calculated and through statistical techniques its asymptotic behavior is investigated (using numerical experiments) when the discount coefficient approaches 1. The results obtained define the conditions under which an approximate optimal stationary policy can be used to control the original process. 展开更多
关键词 Control Consumption-Investment process Discrete-Time markov Control process Expected Total Discounted reward Probabilistic Metrics Stability Index Estimation
下载PDF
Variance Optimization for Continuous-Time Markov Decision Processes
8
作者 Yaqing Fu 《Open Journal of Statistics》 2019年第2期181-195,共15页
This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space... This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper. 展开更多
关键词 CONTINUOUS-TIME markov Decision process Variance OPTIMALITY of Average reward Optimal POLICY of Variance POLICY ITERATION
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部