This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight...This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.展开更多
The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to d...The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to databit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determinethe optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantizationcan effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In thispaper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to low bitwidth is proposed, and reinforcement learning is used to automatically predict the mixed precision that meets theconstraints of hardware resources. In the state-space design, the standard deviation of weights is used to measurethe distribution difference of data, the execution speed feedback of simulated neural network accelerator inferenceis used as the environment to limit the action space of the agent, and the accuracy of the quantization model afterretraining is used as the reward function to guide the agent to carry out deep reinforcement learning training. Theexperimental results show that the proposed method obtains a suitable model layer-by-layer quantization strategyunder the condition that the computational resources are satisfied, and themodel accuracy is effectively improved.The proposed method has strong intelligence and certain universality and has strong application potential in thefield of mixed precision quantization and embedded neural network model deployment.展开更多
Reinforcement learning(RL)can free automated vehicles(AVs)from the car-following constraints and provide more possible explorations for mixed behavior.This study uses deep RL as AVs’longitudinal control and designs a...Reinforcement learning(RL)can free automated vehicles(AVs)from the car-following constraints and provide more possible explorations for mixed behavior.This study uses deep RL as AVs’longitudinal control and designs a multi-level objectives framework for AVs’trajectory decision-making based on multi-agent DRL.The saturated signalized intersection is taken as the research object to seek the upper limit of traffic efficiency and realize the specific target control.The simulation results demonstrate the convergence of the proposed framework in complex scenarios.When prioritizing throughputs as the primary objective and emissions as the secondary objective,both indicators exhibit a linear growth pattern with increasing market penetration rate(MPR).Compared with MPR is 0%,the throughputs can be increased by 69.2%when MPR is 100%.Compared with linear adaptive cruise control(LACC)under the same MPR,the emissions can also be reduced by up to 78.8%.Under the control of the fixed throughputs,compared with LACC,the emission benefits grow nearly linearly as MPR increases,it can reach 79.4%at 80%MPR.This study employs experimental results to analyze the behavioral changes of mixed flow and the mechanism of mixed autonomy to improve traffic efficiency.The proposed method is flexible and serves as a valuable tool for exploring and studying the behavior of mixed flow behavior and the patterns of mixed autonomy.展开更多
基金supported by the National Science and Technology Major Project (2021ZD0112702)the National Natural Science Foundation (NNSF)of China (62373100,62233003)the Natural Science Foundation of Jiangsu Province of China (BK20202006)。
文摘This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.
文摘The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to databit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determinethe optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantizationcan effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In thispaper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to low bitwidth is proposed, and reinforcement learning is used to automatically predict the mixed precision that meets theconstraints of hardware resources. In the state-space design, the standard deviation of weights is used to measurethe distribution difference of data, the execution speed feedback of simulated neural network accelerator inferenceis used as the environment to limit the action space of the agent, and the accuracy of the quantization model afterretraining is used as the reward function to guide the agent to carry out deep reinforcement learning training. Theexperimental results show that the proposed method obtains a suitable model layer-by-layer quantization strategyunder the condition that the computational resources are satisfied, and themodel accuracy is effectively improved.The proposed method has strong intelligence and certain universality and has strong application potential in thefield of mixed precision quantization and embedded neural network model deployment.
基金supported by the National Natural Science Foundation of China(Grant Nos.52272332 and 51578199)Heilongjiang Provincial Natural Science Foundation(Grant No.YQ2021E031)the Fundamental Research Funds for the Central Universities(Grant No.HIT.OCEF.2022026).
文摘Reinforcement learning(RL)can free automated vehicles(AVs)from the car-following constraints and provide more possible explorations for mixed behavior.This study uses deep RL as AVs’longitudinal control and designs a multi-level objectives framework for AVs’trajectory decision-making based on multi-agent DRL.The saturated signalized intersection is taken as the research object to seek the upper limit of traffic efficiency and realize the specific target control.The simulation results demonstrate the convergence of the proposed framework in complex scenarios.When prioritizing throughputs as the primary objective and emissions as the secondary objective,both indicators exhibit a linear growth pattern with increasing market penetration rate(MPR).Compared with MPR is 0%,the throughputs can be increased by 69.2%when MPR is 100%.Compared with linear adaptive cruise control(LACC)under the same MPR,the emissions can also be reduced by up to 78.8%.Under the control of the fixed throughputs,compared with LACC,the emission benefits grow nearly linearly as MPR increases,it can reach 79.4%at 80%MPR.This study employs experimental results to analyze the behavioral changes of mixed flow and the mechanism of mixed autonomy to improve traffic efficiency.The proposed method is flexible and serves as a valuable tool for exploring and studying the behavior of mixed flow behavior and the patterns of mixed autonomy.