This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-f...This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-free DRL algorithms can efficiently handle the difficulty of energy system modeling and uncertainty of PV generation.However,discretecontinuous hybrid action space of the considered home energy system challenges existing DRL algorithms for either discrete actions or continuous actions.Thus,a mixed deep reinforcement learning(MDRL)algorithm is proposed,which integrates deep Q-learning(DQL)algorithm and deep deterministic policy gradient(DDPG)algorithm.The DQL algorithm deals with discrete actions,while the DDPG algorithm handles continuous actions.The MDRL algorithm learns optimal strategy by trialand-error interactions with the environment.However,unsafe actions,which violate system constraints,can give rise to great cost.To handle such problem,a safe-MDRL algorithm is further proposed.Simulation studies demonstrate that the proposed MDRL algorithm can efficiently handle the challenge from discrete-continuous hybrid action space for home energy management.The proposed MDRL algorithm reduces the operation cost while maintaining the human thermal comfort by comparing with benchmark algorithms on the test dataset.Moreover,the safe-MDRL algorithm greatly reduces the loss of thermal comfort in the learning stage by the proposed MDRL algorithm.展开更多
The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's ...The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.展开更多
The dry friction and wear behavior of 7075 Al alloy reinforced with SiC 3D continuous ceramic network against Cr12 steel was studied with oscillating dry friction and wear tester under the testing conditions of 70 ℃,...The dry friction and wear behavior of 7075 Al alloy reinforced with SiC 3D continuous ceramic network against Cr12 steel was studied with oscillating dry friction and wear tester under the testing conditions of 70 ℃, 30 min, and the load range of 40-100 N. The experimental result shows that the characteristic of abrasive wear and oxidation wear mechanisms are present for 3D continuous SiC/7075 Al composite. 3D continuous network ceramic as the reinforcement can avoid composite from the third body wear that usually occurs in traditional particle reinforced composite. Under low load, the composite with low volume fraction of ceramic reinforcement exhibits better wear resistance due to the homogeneous reinforcement distribution with small pore size; on the contrary, under high load, the composite with high reinforcement volume fraction exhibits better wear resistance because of the coarse frame size. Hard SiC frame leads to the wear of Cr12 steel mainly. The frame with high volume fraction corresponds to the high Fe content.展开更多
基金supported by the National Natural Science Foundation of China(No.62002016)the Science and Technology Development Fund,Macao S.A.R.(No.0137/2019/A3)+1 种基金the Beijing Natural Science Foundation(No.9204028)the Guangdong Basic and Applied Basic Research Foundation(No.2019A1515111165)。
文摘This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-free DRL algorithms can efficiently handle the difficulty of energy system modeling and uncertainty of PV generation.However,discretecontinuous hybrid action space of the considered home energy system challenges existing DRL algorithms for either discrete actions or continuous actions.Thus,a mixed deep reinforcement learning(MDRL)algorithm is proposed,which integrates deep Q-learning(DQL)algorithm and deep deterministic policy gradient(DDPG)algorithm.The DQL algorithm deals with discrete actions,while the DDPG algorithm handles continuous actions.The MDRL algorithm learns optimal strategy by trialand-error interactions with the environment.However,unsafe actions,which violate system constraints,can give rise to great cost.To handle such problem,a safe-MDRL algorithm is further proposed.Simulation studies demonstrate that the proposed MDRL algorithm can efficiently handle the challenge from discrete-continuous hybrid action space for home energy management.The proposed MDRL algorithm reduces the operation cost while maintaining the human thermal comfort by comparing with benchmark algorithms on the test dataset.Moreover,the safe-MDRL algorithm greatly reduces the loss of thermal comfort in the learning stage by the proposed MDRL algorithm.
基金supported by the Guangdong Basic and Applied Basic Research Foundation(2024A1515011936)the National Natural Science Foundation of China(62320106008)
文摘The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences.Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning.In this work,we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws.We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards.We apply this finding to solve output-feedback design problems in power systems.The results reveal that our approach removes an intermediate stage of identifying dynamical models.Our work suggests that the discrete-time reward is efficient in search of the desired decision law,which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.
基金Project(50575076)supported by the National Natural Science Foundation of ChinaProject(36547) supported by the Natural Science Foundation of Guangdong Province, China
文摘The dry friction and wear behavior of 7075 Al alloy reinforced with SiC 3D continuous ceramic network against Cr12 steel was studied with oscillating dry friction and wear tester under the testing conditions of 70 ℃, 30 min, and the load range of 40-100 N. The experimental result shows that the characteristic of abrasive wear and oxidation wear mechanisms are present for 3D continuous SiC/7075 Al composite. 3D continuous network ceramic as the reinforcement can avoid composite from the third body wear that usually occurs in traditional particle reinforced composite. Under low load, the composite with low volume fraction of ceramic reinforcement exhibits better wear resistance due to the homogeneous reinforcement distribution with small pore size; on the contrary, under high load, the composite with high reinforcement volume fraction exhibits better wear resistance because of the coarse frame size. Hard SiC frame leads to the wear of Cr12 steel mainly. The frame with high volume fraction corresponds to the high Fe content.