Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore.
J Chem Inf Model. 2021 Apr 26;61(4):1617-1626. doi: 10.1021/acs.jcim.0c01415. Epub 2021 Mar 16.
Efficient molecular featurization is one of the major issues for machine learning models in drug design. Here, we propose a persistent Ricci curvature (PRC), in particular, Ollivier PRC (OPRC), for the molecular featurization and feature engineering, for the first time. The filtration process proposed in the persistent homology is employed to generate a series of nested molecular graphs. Persistence and variation of Ollivier Ricci curvatures on these nested graphs are defined as OPRC. Moreover, persistent attributes, which are statistical and combinatorial properties of OPRCs during the filtration process, are used as molecular descriptors and further combined with machine learning models, in particular, gradient boosting tree (GBT). Our OPRC-GBT model is used in the prediction of the protein-ligand binding affinity, which is one of the key steps in drug design. Based on three of the most commonly used data sets from the well-established protein-ligand binding databank, that is, PDBbind, we intensively test our model and compare with existing models. It has been found that our model can achieve the state-of-the-art results and has advantages over traditional molecular descriptors.
高效的分子特征化是药物设计中机器学习模型的主要问题之一。在这里,我们首次提出了持久 Ricci 曲率(PRC),特别是奥利维尔 PRC(OPRC),用于分子特征化和特征工程。持久同调中提出的过滤过程用于生成一系列嵌套分子图。在这些嵌套图上定义了奥利维尔 Ricci 曲率的持久性和变化,作为 OPRC。此外,持久属性是过滤过程中 OPRC 的统计和组合特性,用作分子描述符,并进一步与机器学习模型,特别是梯度提升树(GBT)结合使用。我们的 OPRC-GBT 模型用于预测蛋白质-配体结合亲和力,这是药物设计的关键步骤之一。基于三个最常用的来自成熟的蛋白质-配体结合数据库的数据集,即 PDBbind,我们对我们的模型进行了深入测试,并与现有模型进行了比较。结果表明,我们的模型可以达到最先进的结果,并且优于传统的分子描述符。