Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but...Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but it is often beneficial to use a function-approximation system, such as deep neural networks, to estimate state values. It has been previously observed that Q-learning can be unstable when using value function approximation or when operating in a stochastic environment. This instability can adversely affect the algorithm’s ability to maximize its returns. In this paper, we present a new algorithm called Multi Q-learning to attempt to overcome the instability seen in Q-learning. We test our algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks. Our results show that in most cases, Multi Q-learning outperforms Q-learning, achieving average returns up to 2.5 times higher than Q-learning and having a standard deviation of state values as low as 0.58.展开更多
This paper presents a smart energy community management approach which is capable of implementing P2P trading and managing household energy storage systems.A smart residential community concept is proposed consisting ...This paper presents a smart energy community management approach which is capable of implementing P2P trading and managing household energy storage systems.A smart residential community concept is proposed consisting of domestic users and a local energy pool,in which users are free to trade with the local energy pool and enjoy cheap renewable energy while avoiding the installation of new energy generation equipment.The local energy pool could harvest surplus energy from users and renewable resources,at the same time it sells energy at a higher price than Feed-in-Tariff(FIT)but lower than the retail price.In order to encourage the participation in local energy trading,the electricity price of the energy pool is determined by a real-time demand/supply ratio.Under this pricing mechanism,retail price,users and renewable energy could all affect the electricity price which leads to higher consumers’profits and more optimized utilization of renewable energy.The proposed energy trading process was modeled as a Markov Decision Process(MDP)and a reinforcement learning algorithm was adopted to find the optimal decision in the MDP because of its excellent performance in on-going and model-free tasks.In addition,the fuzzy inference system makes it possible to use Q-learning in continuous state-space problems(Fuzzy Q-learning)considering the infinite possibilities in the energy trading process.To evaluate the performance of the proposed demand side management system,a numerical analysis is conducted in a community comparing the electricity costs before and after using the proposed energy management system.展开更多
文摘Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but it is often beneficial to use a function-approximation system, such as deep neural networks, to estimate state values. It has been previously observed that Q-learning can be unstable when using value function approximation or when operating in a stochastic environment. This instability can adversely affect the algorithm’s ability to maximize its returns. In this paper, we present a new algorithm called Multi Q-learning to attempt to overcome the instability seen in Q-learning. We test our algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks. Our results show that in most cases, Multi Q-learning outperforms Q-learning, achieving average returns up to 2.5 times higher than Q-learning and having a standard deviation of state values as low as 0.58.
基金This work was supported by the National Natural Science Foundation of China(No.51807024).
文摘This paper presents a smart energy community management approach which is capable of implementing P2P trading and managing household energy storage systems.A smart residential community concept is proposed consisting of domestic users and a local energy pool,in which users are free to trade with the local energy pool and enjoy cheap renewable energy while avoiding the installation of new energy generation equipment.The local energy pool could harvest surplus energy from users and renewable resources,at the same time it sells energy at a higher price than Feed-in-Tariff(FIT)but lower than the retail price.In order to encourage the participation in local energy trading,the electricity price of the energy pool is determined by a real-time demand/supply ratio.Under this pricing mechanism,retail price,users and renewable energy could all affect the electricity price which leads to higher consumers’profits and more optimized utilization of renewable energy.The proposed energy trading process was modeled as a Markov Decision Process(MDP)and a reinforcement learning algorithm was adopted to find the optimal decision in the MDP because of its excellent performance in on-going and model-free tasks.In addition,the fuzzy inference system makes it possible to use Q-learning in continuous state-space problems(Fuzzy Q-learning)considering the infinite possibilities in the energy trading process.To evaluate the performance of the proposed demand side management system,a numerical analysis is conducted in a community comparing the electricity costs before and after using the proposed energy management system.