Schilter Oliver, Vaucher Alain, Schwaller Philippe, Laino Teodoro
IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
National Center for Competence in Research-Catalysis (NCCR-Catalysis) Switzerland.
Digit Discov. 2023 Apr 17;2(3):728-735. doi: 10.1039/d2dd00125j. eCollection 2023 Jun 12.
The need for more efficient catalytic processes is ever-growing, and so are the costs associated with experimentally searching chemical space to find new promising catalysts. Despite the consolidated use of density functional theory (DFT) and other atomistic models for virtually screening molecules based on their simulated performance, data-driven approaches are rising as indispensable tools for designing and improving catalytic processes. Here, we present a deep learning model capable of generating new catalyst-ligand candidates by self-learning meaningful structural features solely from their language representation and computed binding energies. We train a recurrent neural network-based Variational Autoencoder (VAE) to compress the molecular representation of the catalyst into a lower dimensional latent space, in which a feed-forward neural network predicts the corresponding binding energy to be used as the optimization function. The outcome of the optimization in the latent space is then reconstructed back into the original molecular representation. These trained models achieve state-of-the-art predictive performances in catalysts' binding energy prediction and catalysts' design, with a mean absolute error of 2.42 kcal mol and an ability to generate 84% valid and novel catalysts.
对更高效催化过程的需求不断增长,通过实验探索化学空间以寻找新的有前景催化剂的相关成本亦是如此。尽管密度泛函理论(DFT)和其他原子模型已被广泛用于基于模拟性能对分子进行虚拟筛选,但数据驱动方法正作为设计和改进催化过程的不可或缺工具而兴起。在此,我们展示了一种深度学习模型,它能够仅通过从语言表示和计算出的结合能中自学习有意义的结构特征来生成新的催化剂-配体候选物。我们训练了一个基于循环神经网络的变分自编码器(VAE),将催化剂的分子表示压缩到一个低维潜在空间,其中前馈神经网络预测相应的结合能以用作优化函数。然后将潜在空间中优化的结果重建回原始分子表示。这些经过训练的模型在催化剂结合能预测和催化剂设计方面实现了领先的预测性能,平均绝对误差为2.42千卡/摩尔,并且能够生成84%有效的新型催化剂。