Dong Daoyi, Chen Chunlin, Li Hanxiong, Tarn Tzyh-Jong
Key Laboratory of Systems and Control, Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.
IEEE Trans Syst Man Cybern B Cybern. 2008 Oct;38(5):1207-20. doi: 10.1109/TSMCB.2008.925743.
The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.
机器学习的关键方法,尤其是在未知概率环境中的学习,是新的表示方法和计算机制。本文通过将量子理论与强化学习(RL)相结合,提出了一种新颖的量子强化学习(QRL)方法。受状态叠加原理和量子并行性的启发,引入了一种价值更新算法框架。传统RL中的状态(动作)在QRL中被识别为特征状态(特征动作)。状态(动作)集可以用量子叠加态表示,并且根据量子测量的坍缩假设,通过随机观察模拟量子态可以获得特征状态(特征动作)。特征动作的概率由概率幅决定,概率幅根据奖励并行更新。还分析了QRL的一些相关特性,如收敛性、最优性以及探索与利用之间的平衡,结果表明该方法利用概率幅在探索与利用之间实现了良好的权衡,并且可以通过量子并行性加速学习。为了评估QRL的性能和实用性,给出了几个模拟实验,结果证明了QRL算法在解决一些复杂问题时的有效性和优越性。本文也是对量子计算在人工智能应用方面的一次有效探索。