Lifar Mikhail S, Tereshchenko Andrei A, Bulgakov Aleksei N, Guda Sergey A, Guda Alexander A, Soldatov Alexander V
The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia.
Institute for Mathematics, Mechanics and Computer Science in the name of I.I. Vorovich, Southern Federal University, 344090 Rostov-on-Don, Russia.
ACS Omega. 2024 Jun 20;9(26):27987-27997. doi: 10.1021/acsomega.3c10422. eCollection 2024 Jul 2.
Metal nanoparticles are widely used as heterogeneous catalysts to activate adsorbed molecules and reduce the energy barrier of the reaction. Reaction product yield depends on the interplay between elementary processes: adsorption, activation, desorption, and reaction. These processes, in turn, depend on the inlet gas composition, temperature, and pressure. At a steady state, the active surface sites may be inaccessible due to adsorbed reagents. Periodic regime may thus improve the yield, but the appropriate period and waveform are not known in advance. Dynamic control should account for surface and atmospheric modifications and adjust reaction parameters according to the current state of the system and its history. In this work, we applied a reinforcement learning algorithm to control CO oxidation on a palladium catalyst. The policy gradient algorithm was trained in the theoretical environment, parametrized from experimental data. The algorithm learned to maximize the CO formation rate based on CO and O partial pressures for several successive time steps. Within a unified approach, we found optimal stationary, periodic, and nonperiodic regimes for different problem formulations and gained insight into why the dynamic regime can be preferential. In general, this work contributes to the task of popularizing the reinforcement learning approach in the field of catalytic science.
金属纳米颗粒作为多相催化剂被广泛用于活化吸附分子并降低反应的能量壁垒。反应产物的产率取决于基本过程之间的相互作用:吸附、活化、解吸和反应。而这些过程又取决于进气组成、温度和压力。在稳态下,由于吸附的试剂,活性表面位点可能无法被利用。因此,周期性状态可能会提高产率,但合适的周期和波形事先并不清楚。动态控制应考虑表面和大气的变化,并根据系统的当前状态及其历史来调整反应参数。在这项工作中,我们应用强化学习算法来控制钯催化剂上的一氧化碳氧化反应。策略梯度算法在理论环境中进行训练,该理论环境根据实验数据进行参数化。该算法学会了在几个连续的时间步长内,基于一氧化碳和氧气的分压来最大化一氧化碳的生成速率。在统一的方法中,我们针对不同的问题表述找到了最优的稳态、周期性和非周期性状态,并深入了解了为什么动态状态可能更具优势。总的来说,这项工作有助于在催化科学领域推广强化学习方法这一任务。