Quay crane and yard truck scheduling are two important subproblems in container terminal opera- tions which have been studied separately in previous research, This paper proposes a new problem for the integrated quay ...Quay crane and yard truck scheduling are two important subproblems in container terminal opera- tions which have been studied separately in previous research, This paper proposes a new problem for the integrated quay crane and yard truck scheduling for inbound containers. The problem is formulated as a mixed integer programming (MIP) model. Due to the intractability, a genetic algorithm (GA) and a modified Johnson's Rule-based heuristic algorithm (MJRHA) are used for the problem solution. In addition, two closed form lower bounds are given to evaluate the solution accuracy. Computational experiments show that the solution algorithm can efficiently handle the scheduling problem and that the integrated methods are very useful.展开更多
This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge t...This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.展开更多
Since analog systems play an essential role in modern equipment,test strategy optimization for analog systems has attracted extensive attention in both academia and industry.Although many methods exist for the impleme...Since analog systems play an essential role in modern equipment,test strategy optimization for analog systems has attracted extensive attention in both academia and industry.Although many methods exist for the implementation of effective test strategies,diagnosis for analog systems suffers from the impacts of various stresses due to sophisticated mechanism and variable operational conditions.Consequently,the generated solutions are impractical due to the systems’topology and influence of information redundancy.Additionally,independent tests operating sequentially on the generated strategies may increase the time consumption.To overcome the above weaknesses,we propose a novel approach called heuristic programming(HP)to generate a mixture of test strategies.The experimental results prove that HP and Rollout-HP access the strategy with fewer layers and lower cost consumption than state-of-the-art methods.Both HP and Rollout-HP provide more practical strategies than other methods.Additionally,the cost consumption of the strategy based on HP and Rollout-HP is improved compared with those of other methods because of the updating of the test cost and adaptation of mixture OR nodes.Hence,the proposed HP and Rollout-HP methods have high efficiency.展开更多
Optimal layout of rectangular stock cutting is still in great demand from industry for diversified applications. This paper introduces four basic solution methods to the problem linear programming, dynamic programming...Optimal layout of rectangular stock cutting is still in great demand from industry for diversified applications. This paper introduces four basic solution methods to the problem linear programming, dynamic programming, tree search and heuristic approach. A prototype of application software is developed to verify the pros and cons of various approaches展开更多
Dyna is an effective reinforcement learning (RL) approach that combines value function evaluation with model learning.However,existing works on Dyna mostly discuss only its efficiency in RL problems with discrete acti...Dyna is an effective reinforcement learning (RL) approach that combines value function evaluation with model learning.However,existing works on Dyna mostly discuss only its efficiency in RL problems with discrete action spaces.This paper proposes a novel Dyna variant,called Dyna-LSTD-PA,aiming to handle problems with continuous action spaces.Dyna-LSTD-PA stands for Dyna based on least-squares temporal difference (LSTD)and policy approximation.Dyna-LSTD-PA consists of two simultaneous,interacting processes.The learning process determines the probability distribution over action spaces using the Gaussian distribution;estimates the underlying value function,policy,and model by linear representation;and updates their parameter vectors online by LSTD(,t).The planning process updates the parameter vector of the value function again by using ofttine LSTD(2).Dyna-LSTD-PA also uses the Sherman-Morrison formula to improve the efficiency of LSTD(,t),and weights the parameter vector of the value function to bring the two processes together.Theoretically,the global error bound is derived by considering approximation,estimation,and model errors.Experimentally,Dyna-LSTD-PA outperforms two representative methods in terms of convergence rate,success rate,and stability performance on four benchmark RL problems.展开更多
文摘Quay crane and yard truck scheduling are two important subproblems in container terminal opera- tions which have been studied separately in previous research, This paper proposes a new problem for the integrated quay crane and yard truck scheduling for inbound containers. The problem is formulated as a mixed integer programming (MIP) model. Due to the intractability, a genetic algorithm (GA) and a modified Johnson's Rule-based heuristic algorithm (MJRHA) are used for the problem solution. In addition, two closed form lower bounds are given to evaluate the solution accuracy. Computational experiments show that the solution algorithm can efficiently handle the scheduling problem and that the integrated methods are very useful.
基金the National Key Research and Development Program of China(2021ZD0112302)the National Natural Science Foundation of China(62222301,61890930-5,62021003)the Beijing Natural Science Foundation(JQ19013).
文摘This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control problems.It is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)equation.Then,the stability of the system is analyzed using control policies generated by MsHDP.Also,a general stability criterion is designed to determine the admissibility of the current control policy.That is,the criterion is applicable not only to traditional value iteration and policy iteration but also to MsHDP.Further,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency greatly.Besides,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter architecture.Finally,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.
基金Project supported by the Youth and Middle-Aged Scientific and Technological Innovation Leading Talents Program of the Corps,China(No.2020 JDT0008)。
文摘Since analog systems play an essential role in modern equipment,test strategy optimization for analog systems has attracted extensive attention in both academia and industry.Although many methods exist for the implementation of effective test strategies,diagnosis for analog systems suffers from the impacts of various stresses due to sophisticated mechanism and variable operational conditions.Consequently,the generated solutions are impractical due to the systems’topology and influence of information redundancy.Additionally,independent tests operating sequentially on the generated strategies may increase the time consumption.To overcome the above weaknesses,we propose a novel approach called heuristic programming(HP)to generate a mixture of test strategies.The experimental results prove that HP and Rollout-HP access the strategy with fewer layers and lower cost consumption than state-of-the-art methods.Both HP and Rollout-HP provide more practical strategies than other methods.Additionally,the cost consumption of the strategy based on HP and Rollout-HP is improved compared with those of other methods because of the updating of the test cost and adaptation of mixture OR nodes.Hence,the proposed HP and Rollout-HP methods have high efficiency.
文摘Optimal layout of rectangular stock cutting is still in great demand from industry for diversified applications. This paper introduces four basic solution methods to the problem linear programming, dynamic programming, tree search and heuristic approach. A prototype of application software is developed to verify the pros and cons of various approaches
文摘Dyna is an effective reinforcement learning (RL) approach that combines value function evaluation with model learning.However,existing works on Dyna mostly discuss only its efficiency in RL problems with discrete action spaces.This paper proposes a novel Dyna variant,called Dyna-LSTD-PA,aiming to handle problems with continuous action spaces.Dyna-LSTD-PA stands for Dyna based on least-squares temporal difference (LSTD)and policy approximation.Dyna-LSTD-PA consists of two simultaneous,interacting processes.The learning process determines the probability distribution over action spaces using the Gaussian distribution;estimates the underlying value function,policy,and model by linear representation;and updates their parameter vectors online by LSTD(,t).The planning process updates the parameter vector of the value function again by using ofttine LSTD(2).Dyna-LSTD-PA also uses the Sherman-Morrison formula to improve the efficiency of LSTD(,t),and weights the parameter vector of the value function to bring the two processes together.Theoretically,the global error bound is derived by considering approximation,estimation,and model errors.Experimentally,Dyna-LSTD-PA outperforms two representative methods in terms of convergence rate,success rate,and stability performance on four benchmark RL problems.