In the domain of autonomous industrial manipulators,precise positioning and appropriate posture selection in path planning are pivotal for tasks involving obstacle avoidance,such as handling,heat sealing,and stacking....In the domain of autonomous industrial manipulators,precise positioning and appropriate posture selection in path planning are pivotal for tasks involving obstacle avoidance,such as handling,heat sealing,and stacking.While Multi-Degree-of-Freedom(MDOF)manipulators offer kinematic redundancy,aiding in the derivation of optimal inverse kinematic solutions to meet position and posture requisites,their path planning entails intricate multiobjective optimization,encompassing path,posture,and joint motion optimization.Achieving satisfactory results in practical scenarios remains challenging.In response,this study introduces a novel Reverse Path Planning(RPP)methodology tailored for industrial manipulators.The approach commences by conceptualizing the manipulator’s end-effector as an agent within a reinforcement learning(RL)framework,wherein the state space,action set,and reward function are precisely defined to expedite the search for an initial collision-free path.To enhance convergence speed,the Q-learning algorithm in RL is augmented with Dyna-Q.Additionally,we formulate the cylindrical bounding box of the manipulator based on its Denavit-Hartenberg(DH)parameters and propose a swift collision detection technique.Furthermore,the motion performance of the end-effector is refined through a bidirectional search,and joint weighting coefficients are introduced to mitigate motion in high-power joints.The efficacy of the proposed RPP methodology is rigorously examined through extensive simulations conducted on a six-degree-of-freedom(6-DOF)manipulator encountering two distinct obstacle configurations and target positions.Experimental results substantiate that the RPP method adeptly orchestrates the computation of the shortest collision-free path while adhering to specific posture constraints at the target point.Moreover,itminimizes both posture angle deviations and joint motion,showcasing its prowess in enhancing the operational performance of MDOF industrial manipulators.展开更多
The mobile robot path planning problem is one of the main contents of reinforcement learning research.In traditional reinforcement learning,the agent obtains the cumulative reward value in the process of interacting w...The mobile robot path planning problem is one of the main contents of reinforcement learning research.In traditional reinforcement learning,the agent obtains the cumulative reward value in the process of interacting with the environ-ment andfinally converges to the optimal strategy.The Dyna learning framework in reinforcement learning obtains an estimation model in the real environment.The virtual samples generated by the estimation model are updated together with the empirical samples obtained in the real environment to update the value func-tion or strategy function to improve the convergence efficiency.At present,when reinforcement learning is used for path planning tasks,continuous motion can-not be solved in a large-scale continuous environment,and the convergence is poor.In this paper,we use RBFNN to approximate the Q-value table in the Dyna-Q algorithm to solve the drawbacks in traditional algorithms.The experimental results show that the convergence speed of the improved Dyna-RQ algorithm is significantly faster,which improves the efficiency of mobile robot path planning.展开更多
A novel approach was presented to solve the navigation problem of autonomous mobile robots in unknown environments with dense obstacles based on a univector field method. In an obstacle-free environment, a robot is en...A novel approach was presented to solve the navigation problem of autonomous mobile robots in unknown environments with dense obstacles based on a univector field method. In an obstacle-free environment, a robot is ensured to reach the goal position with the desired posture by following the univector field. Contrariwise, the univector field cannot guarantee that the robot will avoid obstacles in environments. In order to create an intelligent mobile robot being able to perform the obstacle avoidance task while following the univector field, Dyna-Q algorithm is developed to train the robot in learning moving directions to attain a collision-free path for its navigation. Simulations on the computer as well as experiments on the real world prove that the proposed algorithm is efficient for training the robot in reaching the goal position with the desired final orientation.展开更多
This paper presents an extended Dyna-Q algorithm to improve efficiency of the standard Dyna-Q algorithm.In the first episodes of the standard Dyna-Q algorithm,the agent travels blindly to find a goal position.To overc...This paper presents an extended Dyna-Q algorithm to improve efficiency of the standard Dyna-Q algorithm.In the first episodes of the standard Dyna-Q algorithm,the agent travels blindly to find a goal position.To overcome this weakness,our approach is to use a maximum likelihood model of all state-action pairs to choose actions and update Q-values in the first few episodes.Our algorithm is compared with one-step Q-learning algorithm and the standard Dyna-Q algorithm for the path planning problem in maze environments.Experimental results show that the proposed algorithm is more efficient than the one-step Q-learning algorithm as well as the standard Dyna-Q algorithm,especially in the large environment of states.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.62001199Fujian Province Nature Science Foundation under Grant No.2023J01925.
文摘In the domain of autonomous industrial manipulators,precise positioning and appropriate posture selection in path planning are pivotal for tasks involving obstacle avoidance,such as handling,heat sealing,and stacking.While Multi-Degree-of-Freedom(MDOF)manipulators offer kinematic redundancy,aiding in the derivation of optimal inverse kinematic solutions to meet position and posture requisites,their path planning entails intricate multiobjective optimization,encompassing path,posture,and joint motion optimization.Achieving satisfactory results in practical scenarios remains challenging.In response,this study introduces a novel Reverse Path Planning(RPP)methodology tailored for industrial manipulators.The approach commences by conceptualizing the manipulator’s end-effector as an agent within a reinforcement learning(RL)framework,wherein the state space,action set,and reward function are precisely defined to expedite the search for an initial collision-free path.To enhance convergence speed,the Q-learning algorithm in RL is augmented with Dyna-Q.Additionally,we formulate the cylindrical bounding box of the manipulator based on its Denavit-Hartenberg(DH)parameters and propose a swift collision detection technique.Furthermore,the motion performance of the end-effector is refined through a bidirectional search,and joint weighting coefficients are introduced to mitigate motion in high-power joints.The efficacy of the proposed RPP methodology is rigorously examined through extensive simulations conducted on a six-degree-of-freedom(6-DOF)manipulator encountering two distinct obstacle configurations and target positions.Experimental results substantiate that the RPP method adeptly orchestrates the computation of the shortest collision-free path while adhering to specific posture constraints at the target point.Moreover,itminimizes both posture angle deviations and joint motion,showcasing its prowess in enhancing the operational performance of MDOF industrial manipulators.
文摘The mobile robot path planning problem is one of the main contents of reinforcement learning research.In traditional reinforcement learning,the agent obtains the cumulative reward value in the process of interacting with the environ-ment andfinally converges to the optimal strategy.The Dyna learning framework in reinforcement learning obtains an estimation model in the real environment.The virtual samples generated by the estimation model are updated together with the empirical samples obtained in the real environment to update the value func-tion or strategy function to improve the convergence efficiency.At present,when reinforcement learning is used for path planning tasks,continuous motion can-not be solved in a large-scale continuous environment,and the convergence is poor.In this paper,we use RBFNN to approximate the Q-value table in the Dyna-Q algorithm to solve the drawbacks in traditional algorithms.The experimental results show that the convergence speed of the improved Dyna-RQ algorithm is significantly faster,which improves the efficiency of mobile robot path planning.
基金Project(2010-0012609) supported by the Basic Science Research Program,Korea
文摘A novel approach was presented to solve the navigation problem of autonomous mobile robots in unknown environments with dense obstacles based on a univector field method. In an obstacle-free environment, a robot is ensured to reach the goal position with the desired posture by following the univector field. Contrariwise, the univector field cannot guarantee that the robot will avoid obstacles in environments. In order to create an intelligent mobile robot being able to perform the obstacle avoidance task while following the univector field, Dyna-Q algorithm is developed to train the robot in learning moving directions to attain a collision-free path for its navigation. Simulations on the computer as well as experiments on the real world prove that the proposed algorithm is efficient for training the robot in reaching the goal position with the desired final orientation.
基金supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education,Science and Technology(2010-0012609)
文摘This paper presents an extended Dyna-Q algorithm to improve efficiency of the standard Dyna-Q algorithm.In the first episodes of the standard Dyna-Q algorithm,the agent travels blindly to find a goal position.To overcome this weakness,our approach is to use a maximum likelihood model of all state-action pairs to choose actions and update Q-values in the first few episodes.Our algorithm is compared with one-step Q-learning algorithm and the standard Dyna-Q algorithm for the path planning problem in maze environments.Experimental results show that the proposed algorithm is more efficient than the one-step Q-learning algorithm as well as the standard Dyna-Q algorithm,especially in the large environment of states.