A recent work has shown that using an ion trap quantum processor can speed up the decision making of a reinforcement learning agent. Its quantum advantage is observed when the external environment changes, and then ag...A recent work has shown that using an ion trap quantum processor can speed up the decision making of a reinforcement learning agent. Its quantum advantage is observed when the external environment changes, and then agent needs to relearn again. One character of this quantum hardware system discovered in this study is that it tends to overestimate the values used to determine the actions the agent will take. IBM’s five qubit superconducting quantum processor is a popular quantum platform. The aims of our study are twofold. First we want to identify the hardware characteristic features of IBM’s 5Q quantum computer when running this learning agent, compared with the ion trap processor. Second, through careful analysis, we observe that the quantum circuit employed in the ion trap processor for this agent could be simplified. Furthermore, when tested on IBM’s 5Q quantum processor, our simplified circuit demonstrates its enhanced performance over the original circuit on one of the hard learning tasks investigated in the previous work. We also use IBM’s quantum simulator when a good baseline is needed to compare the performances. As more and more quantum hardware devices are moving out of the laboratory and becoming generally available to public use, our work emphasizes the fact that the features and constraints of the quantum hardware could take a toll on the performance of quantum algorithms.展开更多
The advantage of quantum computers over classical computers fuels the recent trend of developing machine learning algorithms on quantum computers, which can potentially lead to breakthroughs and new learning models in...The advantage of quantum computers over classical computers fuels the recent trend of developing machine learning algorithms on quantum computers, which can potentially lead to breakthroughs and new learning models in this area. The aim of our study is to explore deep quantum reinforcement learning (RL) on photonic quantum computers, which can process information stored in the quantum states of light. These quantum computers can naturally represent continuous variables, making them an ideal platform to create quantum versions of neural networks. Using quantum photonic circuits, we implement Q learning and actor-critic algorithms with multilayer quantum neural networks and test them in the grid world environment. Our experiments show that 1) these quantum algorithms can solve the RL problem and 2) compared to one layer, using three layer quantum networks improves the learning of both algorithms in terms of rewards collected. In summary, our findings suggest that having more layers in deep quantum RL can enhance the learning outcome.展开更多
Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learnin...Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has distinct advantages over learning their expected value as seen in different RL tasks. The shift from using the expectation of returns in traditional RL to the distribution over returns in distributional RL has provided new insights into the dynamics of RL. This paper builds on our recent work investigating the quantum approach towards RL. Our work implements the quantile regression (QR) distributional Q learning with a quantum neural network. This quantum network is evaluated in a grid world environment with a different number of quantiles, illustrating its detailed influence on the learning of the algorithm. It is also compared to the standard quantum Q learning in a Markov Decision Process (MDP) chain, which demonstrates that the quantum QR distributional Q learning can explore the environment more efficiently than the standard quantum Q learning. Efficient exploration and balancing of exploitation and exploration are major challenges in RL. Previous work has shown that more informative actions can be taken with a distributional perspective. Our findings suggest another cause for its success: the enhanced performance of distributional RL can be partially attributed to its superior ability to efficiently explore the environment.展开更多
Applying quantum computing techniques to machine learning has attracted widespread attention recently and quantum machine learning has become a hot research topic. There are three major categories of machine learning:...Applying quantum computing techniques to machine learning has attracted widespread attention recently and quantum machine learning has become a hot research topic. There are three major categories of machine learning: supervised, unsupervised, and reinforcement learning (RL). However, quantum RL has made the least progress when compared to the other two areas. In this study, we implement the well-known RL algorithm Q learning with a quantum neural network and evaluate it in the grid world environment. RL is learning through interactions with the environment, with the aim of discovering a strategy to maximize the expected cumulative rewards. Problems in RL bring in unique challenges to the study with their sequential nature of learning, potentially long delayed reward signals, and large or infinite size of state and action spaces. This study extends our previous work on solving the contextual bandit problem using a quantum neural network, where the reward signals are immediate after each action.展开更多
We show the practicality of two existing meta-learning algorithms Model-</span></span><span><span><span> </span></span></span><span><span><span><spa...We show the practicality of two existing meta-learning algorithms Model-</span></span><span><span><span> </span></span></span><span><span><span><span style="font-family:Verdana;">Agnostic Meta-Learning and Fast Context Adaptation Via Meta-learning using an evolutionary strategy for parameter optimization, as well as propose two novel quantum adaptations of those algorithms using continuous quantum neural networks, for learning to trade portfolios of stocks on the stock market. The goal of meta-learning is to train a model on a variety of tasks, such that it can solve new learning tasks using only a small number of training samples. In our classical approach, we trained our meta-learning models on a variety of portfolios that contained 5 randomly sampled Consumer Cyclical stocks from a pool of 60. In our quantum approach, we trained our </span><span style="font-family:Verdana;">quantum meta-learning models on a simulated quantum computer with</span><span style="font-family:Verdana;"> portfolios containing 2 randomly sampled Consumer Cyclical stocks. Our findings suggest that both classical models could learn a new portfolio with 0.01% of the number of training samples to learn the original portfolios and can achieve a comparable performance within 0.1% Return on Investment of the Buy and Hold strategy. We also show that our much smaller quantum meta-learned models with only 60 model parameters and 25 training epochs </span><span style="font-family:Verdana;">have a similar learning pattern to our much larger classical meta-learned</span><span style="font-family:Verdana;"> models that have over 250,000 model parameters and 2500 training epochs. Given these findings</span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">,</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"> we also discuss the benefits of scaling up our experiments from a simulated quantum computer to a展开更多
Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique p...Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device.展开更多
In this paper, a collection of value-based quantum reinforcement learning algorithms are introduced which use Grover’s algorithm to update the policy, which is stored as a superposition of qubits associated with each...In this paper, a collection of value-based quantum reinforcement learning algorithms are introduced which use Grover’s algorithm to update the policy, which is stored as a superposition of qubits associated with each possible action, and their parameters are explored. These algorithms may be grouped in two classes, one class which uses value functions (V(s)) and new class which uses action value functions (Q(s,a)). The new (Q(s,a))-based quantum algorithms are found to converge faster than V(s)-based algorithms, and in general the quantum algorithms are found to converge in fewer iterations than their classical counterparts, netting larger returns during training. This is due to fact that the (Q(s,a)) algorithms are more precise than those based on V(s), meaning that updates are incorporated into the value function more efficiently. This effect is also enhanced by the observation that the Q(s,a)-based algorithms may be trained with higher learning rates. These algorithms are then extended by adding multiple value functions, which are observed to allow larger learning rates and have improved convergence properties in environments with stochastic rewards, the latter of which is further improved by the probabilistic nature of the quantum algorithms. Finally, the quantum algorithms were found to use less CPU time than their classical counterparts overall, meaning that their benefits may be realized even without a full quantum computer.展开更多
文摘A recent work has shown that using an ion trap quantum processor can speed up the decision making of a reinforcement learning agent. Its quantum advantage is observed when the external environment changes, and then agent needs to relearn again. One character of this quantum hardware system discovered in this study is that it tends to overestimate the values used to determine the actions the agent will take. IBM’s five qubit superconducting quantum processor is a popular quantum platform. The aims of our study are twofold. First we want to identify the hardware characteristic features of IBM’s 5Q quantum computer when running this learning agent, compared with the ion trap processor. Second, through careful analysis, we observe that the quantum circuit employed in the ion trap processor for this agent could be simplified. Furthermore, when tested on IBM’s 5Q quantum processor, our simplified circuit demonstrates its enhanced performance over the original circuit on one of the hard learning tasks investigated in the previous work. We also use IBM’s quantum simulator when a good baseline is needed to compare the performances. As more and more quantum hardware devices are moving out of the laboratory and becoming generally available to public use, our work emphasizes the fact that the features and constraints of the quantum hardware could take a toll on the performance of quantum algorithms.
文摘The advantage of quantum computers over classical computers fuels the recent trend of developing machine learning algorithms on quantum computers, which can potentially lead to breakthroughs and new learning models in this area. The aim of our study is to explore deep quantum reinforcement learning (RL) on photonic quantum computers, which can process information stored in the quantum states of light. These quantum computers can naturally represent continuous variables, making them an ideal platform to create quantum versions of neural networks. Using quantum photonic circuits, we implement Q learning and actor-critic algorithms with multilayer quantum neural networks and test them in the grid world environment. Our experiments show that 1) these quantum algorithms can solve the RL problem and 2) compared to one layer, using three layer quantum networks improves the learning of both algorithms in terms of rewards collected. In summary, our findings suggest that having more layers in deep quantum RL can enhance the learning outcome.
文摘Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has distinct advantages over learning their expected value as seen in different RL tasks. The shift from using the expectation of returns in traditional RL to the distribution over returns in distributional RL has provided new insights into the dynamics of RL. This paper builds on our recent work investigating the quantum approach towards RL. Our work implements the quantile regression (QR) distributional Q learning with a quantum neural network. This quantum network is evaluated in a grid world environment with a different number of quantiles, illustrating its detailed influence on the learning of the algorithm. It is also compared to the standard quantum Q learning in a Markov Decision Process (MDP) chain, which demonstrates that the quantum QR distributional Q learning can explore the environment more efficiently than the standard quantum Q learning. Efficient exploration and balancing of exploitation and exploration are major challenges in RL. Previous work has shown that more informative actions can be taken with a distributional perspective. Our findings suggest another cause for its success: the enhanced performance of distributional RL can be partially attributed to its superior ability to efficiently explore the environment.
文摘Applying quantum computing techniques to machine learning has attracted widespread attention recently and quantum machine learning has become a hot research topic. There are three major categories of machine learning: supervised, unsupervised, and reinforcement learning (RL). However, quantum RL has made the least progress when compared to the other two areas. In this study, we implement the well-known RL algorithm Q learning with a quantum neural network and evaluate it in the grid world environment. RL is learning through interactions with the environment, with the aim of discovering a strategy to maximize the expected cumulative rewards. Problems in RL bring in unique challenges to the study with their sequential nature of learning, potentially long delayed reward signals, and large or infinite size of state and action spaces. This study extends our previous work on solving the contextual bandit problem using a quantum neural network, where the reward signals are immediate after each action.
文摘We show the practicality of two existing meta-learning algorithms Model-</span></span><span><span><span> </span></span></span><span><span><span><span style="font-family:Verdana;">Agnostic Meta-Learning and Fast Context Adaptation Via Meta-learning using an evolutionary strategy for parameter optimization, as well as propose two novel quantum adaptations of those algorithms using continuous quantum neural networks, for learning to trade portfolios of stocks on the stock market. The goal of meta-learning is to train a model on a variety of tasks, such that it can solve new learning tasks using only a small number of training samples. In our classical approach, we trained our meta-learning models on a variety of portfolios that contained 5 randomly sampled Consumer Cyclical stocks from a pool of 60. In our quantum approach, we trained our </span><span style="font-family:Verdana;">quantum meta-learning models on a simulated quantum computer with</span><span style="font-family:Verdana;"> portfolios containing 2 randomly sampled Consumer Cyclical stocks. Our findings suggest that both classical models could learn a new portfolio with 0.01% of the number of training samples to learn the original portfolios and can achieve a comparable performance within 0.1% Return on Investment of the Buy and Hold strategy. We also show that our much smaller quantum meta-learned models with only 60 model parameters and 25 training epochs </span><span style="font-family:Verdana;">have a similar learning pattern to our much larger classical meta-learned</span><span style="font-family:Verdana;"> models that have over 250,000 model parameters and 2500 training epochs. Given these findings</span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">,</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"> we also discuss the benefits of scaling up our experiments from a simulated quantum computer to a
文摘Artificial intelligence has permeated all aspects of our lives today. However, to make AI behave like real AI, the critical bottleneck lies in the speed of computing. Quantum computers employ the peculiar and unique properties of quantum states such as superposition, entanglement, and interference to process information in ways that classical computers cannot. As a new paradigm of computation, quantum computers are capable of performing tasks intractable for classical processors, thus providing a quantum leap in AI research and making the development of real AI a possibility. In this regard, quantum machine learning not only enhances the classical machine learning approach but more importantly it provides an avenue to explore new machine learning models that have no classical counterparts. The qubit-based quantum computers cannot naturally represent the continuous variables commonly used in machine learning, since the measurement outputs of qubit-based circuits are generally discrete. Therefore, a continuous-variable (CV) quantum architecture based on a photonic quantum computing model is selected for our study. In this work, we employ machine learning and optimization to create photonic quantum circuits that can solve the contextual multi-armed bandit problem, a problem in the domain of reinforcement learning, which demonstrates that quantum reinforcement learning algorithms can be learned by a quantum device.
文摘In this paper, a collection of value-based quantum reinforcement learning algorithms are introduced which use Grover’s algorithm to update the policy, which is stored as a superposition of qubits associated with each possible action, and their parameters are explored. These algorithms may be grouped in two classes, one class which uses value functions (V(s)) and new class which uses action value functions (Q(s,a)). The new (Q(s,a))-based quantum algorithms are found to converge faster than V(s)-based algorithms, and in general the quantum algorithms are found to converge in fewer iterations than their classical counterparts, netting larger returns during training. This is due to fact that the (Q(s,a)) algorithms are more precise than those based on V(s), meaning that updates are incorporated into the value function more efficiently. This effect is also enhanced by the observation that the Q(s,a)-based algorithms may be trained with higher learning rates. These algorithms are then extended by adding multiple value functions, which are observed to allow larger learning rates and have improved convergence properties in environments with stochastic rewards, the latter of which is further improved by the probabilistic nature of the quantum algorithms. Finally, the quantum algorithms were found to use less CPU time than their classical counterparts overall, meaning that their benefits may be realized even without a full quantum computer.