Department of Chemistry, University of Tennessee, Knoxville, TN, 37996-1600, USA.
Department of Mathematics, University of Tennessee, Knoxville, TN, 37996-1320, USA.
Nat Commun. 2020 Jun 26;11(1):3230. doi: 10.1038/s41467-020-17035-5.
Machine learning and high-throughput computational screening have been valuable tools in accelerated first-principles screening for the discovery of the next generation of functionalized molecules and materials. The application of machine learning for chemical applications requires the conversion of molecular structures to a machine-readable format known as a molecular representation. The choice of such representations impacts the performance and outcomes of chemical machine learning methods. Herein, we present a new concise molecular representation derived from persistent homology, an applied branch of mathematics. We have demonstrated its applicability in a high-throughput computational screening of a large molecular database (GDB-9) with more than 133,000 organic molecules. Our target is to identify novel molecules that selectively interact with CO. The methodology and performance of the novel molecular fingerprinting method is presented and the new chemically-driven persistence image representation is used to screen the GDB-9 database to suggest molecules and/or functional groups with enhanced properties.
机器学习和高通量计算筛选在加速基于第一性原理的筛选以发现下一代功能化分子和材料方面是非常有价值的工具。机器学习在化学应用中的应用需要将分子结构转换为一种称为分子表示的机器可读格式。这种表示的选择会影响化学机器学习方法的性能和结果。在此,我们提出了一种新的简洁的分子表示方法,该方法来源于应用数学的持续同调。我们已经证明了它在一个超过 133000 个有机分子的大型分子数据库(GDB-9)的高通量计算筛选中的适用性。我们的目标是识别出选择性地与 CO 相互作用的新型分子。本文介绍了新的分子指纹识别方法的方法和性能,并使用新的化学驱动的持续图像表示来筛选 GDB-9 数据库,以提出具有增强性能的分子和/或官能团。