Morijiri Kensei, Mihana Takatomo, Kanno Kazutaka, Naruse Makoto, Uchida Atsushi
Department of Information and Computer Sciences, Saitama University, 255 Shimo-okubo, Sakura-ku, Saitama City, Saitama, 338-8570, Japan.
Department of Information Physics and Computing, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan.
Sci Rep. 2022 May 16;12(1):8073. doi: 10.1038/s41598-022-12155-y.
Decision making using photonic technologies has been intensively researched for solving the multi-armed bandit problem, which is fundamental to reinforcement learning. However, these technologies are yet to be extended to large-scale multi-armed bandit problems. In this study, we conduct a numerical investigation of decision making to solve large-scale multi-armed bandit problems by controlling the biases of chaotic temporal waveforms generated in semiconductor lasers with optical feedback. We generate chaotic temporal waveforms using the semiconductor lasers, and each waveform is assigned to a slot machine (or choice) in the multi-armed bandit problem. The biases in the amplitudes of the chaotic waveforms are adjusted based on rewards using the tug-of-war method. Subsequently, the slot machine that yields the maximum-amplitude chaotic temporal waveform with bias is selected. The scaling properties of the correct decision-making process are examined by increasing the number of slot machines to 1024, and the scaling exponent of the power-law distribution is 0.97. We demonstrate that the proposed method outperforms existing software algorithms in terms of the scaling exponent. This result paves the way for photonic decision making in large-scale multi-armed bandit problems using photonic accelerators.
利用光子技术进行决策已被深入研究,以解决多臂老虎机问题,这是强化学习的基础。然而,这些技术尚未扩展到大规模多臂老虎机问题。在本研究中,我们通过控制光反馈半导体激光器中产生的混沌时间波形的偏差,对解决大规模多臂老虎机问题的决策进行了数值研究。我们使用半导体激光器生成混沌时间波形,并且每个波形被分配到多臂老虎机问题中的一个老虎机(或选择)。基于奖励,使用拔河方法调整混沌波形幅度的偏差。随后,选择产生具有偏差的最大幅度混沌时间波形的老虎机。通过将老虎机数量增加到1024来检查正确决策过程的标度性质,幂律分布的标度指数为0.97。我们证明,所提出的方法在标度指数方面优于现有的软件算法。这一结果为使用光子加速器在大规模多臂老虎机问题中进行光子决策铺平了道路。