Tempke Robert, Musho Terence
Department of Mechanical & Aerospace Engineering, West Virginia University, Morgantown, WV, 26525, USA.
Commun Chem. 2022 Mar 22;5(1):40. doi: 10.1038/s42004-022-00647-x.
Artificial intelligence based chemistry models are a promising method of exploring chemical reaction design spaces. However, training datasets based on experimental synthesis are typically reported only for the optimal synthesis reactions. This leads to an inherited bias in the model predictions. Therefore, robust datasets that span the entirety of the solution space are necessary to remove inherited bias and permit complete training of the space. In this study, an artificial intelligence model based on a Variational AutoEncoder (VAE) has been developed and investigated to synthetically generate continuous datasets. The approach involves sampling the latent space to generate new chemical reactions. This developed technique is demonstrated by generating over 7,000,000 new reactions from a training dataset containing only 7,000 reactions. The generated reactions include molecular species that are larger and more diverse than the training set.
基于人工智能的化学模型是探索化学反应设计空间的一种很有前景的方法。然而,基于实验合成的训练数据集通常仅针对最优合成反应进行报告。这导致模型预测中存在固有偏差。因此,需要跨越整个解空间的稳健数据集来消除固有偏差,并允许对该空间进行完整训练。在本研究中,已经开发并研究了一种基于变分自编码器(VAE)的人工智能模型,以合成生成连续数据集。该方法涉及对潜在空间进行采样以生成新的化学反应。通过从仅包含7000个反应的训练数据集中生成超过700万个新反应,证明了这种开发的技术。生成的反应包括比训练集更大且更多样化的分子种类。