Center for Nanoscale Materials, Argonne National Laboratory, Lemont, IL, 60439, USA.
Department of Mechanical and Industrial Engineering, University of Illinois, Chicago, IL, 60607, USA.
Nat Commun. 2022 Jan 18;13(1):368. doi: 10.1038/s41467-021-27849-6.
Reinforcement learning (RL) approaches that combine a tree search with deep learning have found remarkable success in searching exorbitantly large, albeit discrete action spaces, as in chess, Shogi and Go. Many real-world materials discovery and design applications, however, involve multi-dimensional search problems and learning domains that have continuous action spaces. Exploring high-dimensional potential energy models of materials is an example. Traditionally, these searches are time consuming (often several years for a single bulk system) and driven by human intuition and/or expertise and more recently by global/local optimization searches that have issues with convergence and/or do not scale well with the search dimensionality. Here, in a departure from discrete action and other gradient-based approaches, we introduce a RL strategy based on decision trees that incorporates modified rewards for improved exploration, efficient sampling during playouts and a "window scaling scheme" for enhanced exploitation, to enable efficient and scalable search for continuous action space problems. Using high-dimensional artificial landscapes and control RL problems, we successfully benchmark our approach against popular global optimization schemes and state of the art policy gradient methods, respectively. We demonstrate its efficacy to parameterize potential models (physics based and high-dimensional neural networks) for 54 different elemental systems across the periodic table as well as alloys. We analyze error trends across different elements in the latent space and trace their origin to elemental structural diversity and the smoothness of the element energy surface. Broadly, our RL strategy will be applicable to many other physical science problems involving search over continuous action spaces.
强化学习(RL)方法将树搜索与深度学习相结合,在搜索非常大的离散动作空间方面取得了显著的成功,例如在国际象棋、将棋和围棋中。然而,许多现实世界的材料发现和设计应用涉及多维搜索问题和具有连续动作空间的学习领域。探索材料的高维势能模型就是一个例子。传统上,这些搜索非常耗时(对于单个体系统通常需要数年时间),并且受到人类直觉和/或专业知识的驱动,最近也受到全局/局部优化搜索的驱动,这些搜索存在收敛问题和/或不能很好地扩展到搜索维度。在这里,我们背离了离散动作和其他基于梯度的方法,引入了一种基于决策树的 RL 策略,该策略结合了经过修改的奖励,以提高探索能力、在播放过程中进行高效采样,以及采用“窗口缩放方案”进行增强利用,从而能够对连续动作空间问题进行高效且可扩展的搜索。我们使用高维人工景观和控制 RL 问题,成功地将我们的方法与流行的全局优化方案和最先进的策略梯度方法进行了基准测试。我们证明了它在参数化潜在模型(基于物理的和高维神经网络)方面的有效性,这些模型适用于周期表中 54 种不同的元素系统以及合金。我们分析了潜在空间中不同元素的误差趋势,并将其归因于元素结构多样性和元素能量表面的平滑度。总的来说,我们的 RL 策略将适用于许多其他涉及连续动作空间搜索的物理科学问题。