Department of Mathematics, Michigan State University, East Lansing, Michigan, 48824.
School of Medicine, Foshan University, Foshan, Guangdong, 528000, People's Republic of China.
J Comput Chem. 2018 Jul 30;39(20):1444-1454. doi: 10.1002/jcc.25213. Epub 2018 Apr 6.
Aqueous solubility and partition coefficient are important physical properties of small molecules. Accurate theoretical prediction of aqueous solubility and partition coefficient plays an important role in drug design and discovery. The prediction accuracy depends crucially on molecular descriptors which are typically derived from a theoretical understanding of the chemistry and physics of small molecules. This work introduces an algebraic topology-based method, called element-specific persistent homology (ESPH), as a new representation of small molecules that is entirely different from conventional chemical and/or physical representations. ESPH describes molecular properties in terms of multiscale and multicomponent topological invariants. Such topological representation is systematical, comprehensive, and scalable with respect to molecular size and composition variations. However, it cannot be literally translated into a physical interpretation. Fortunately, it is readily suitable for machine learning methods, rendering topological learning algorithms. Due to the inherent correlation between solubility and partition coefficient, a uniform ESPH representation is developed for both properties, which facilitates multi-task deep neural networks for their simultaneous predictions. This strategy leads to a more accurate prediction of relatively small datasets. A total of six datasets is considered in this work to validate the proposed topological and multitask deep learning approaches. It is demonstrated that the proposed approaches achieve some of the most accurate predictions of aqueous solubility and partition coefficient. Our software is available online at http://weilab.math.msu.edu/TopP-S/. © 2018 Wiley Periodicals, Inc.
水溶解度和分配系数是小分子的重要物理性质。准确预测水溶解度和分配系数对于药物设计和发现具有重要意义。预测精度取决于分子描述符,这些描述符通常来自于对小分子化学和物理的理论理解。本工作引入了一种基于代数拓扑的方法,称为元素特定持久同调(ESPH),作为一种全新的小分子表示方法,与传统的化学和/或物理表示方法完全不同。ESPH 以多尺度和多分量拓扑不变量的形式描述分子性质。这种拓扑表示是系统的、全面的,并且对分子大小和组成变化具有可扩展性。然而,它不能直接转化为物理解释。幸运的是,它非常适合机器学习方法,从而产生拓扑学习算法。由于溶解度和分配系数之间存在固有相关性,因此为这两种性质开发了统一的 ESPH 表示方法,从而促进了多任务深度神经网络对它们的同时预测。这种策略可以实现对相对较小数据集的更准确预测。本工作共考虑了六个数据集来验证所提出的拓扑和多任务深度学习方法。结果表明,所提出的方法可以实现对水溶解度和分配系数的一些最准确预测。我们的软件可在 http://weilab.math.msu.edu/TopP-S/ 上获得。©2018 年 Wiley Periodicals, Inc.