Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab136.
Artificial intelligence (AI) techniques have already been gradually applied to the entire drug design process, from target discovery, lead discovery, lead optimization and preclinical development to the final three phases of clinical trials. Currently, one of the central challenges for AI-based drug design is molecular featurization, which is to identify or design appropriate molecular descriptors or fingerprints. Efficient and transferable molecular descriptors are key to the success of all AI-based drug design models. Here we propose Forman persistent Ricci curvature (FPRC)-based molecular featurization and feature engineering, for the first time. Molecular structures and interactions are modeled as simplicial complexes, which are generalization of graphs to their higher dimensional counterparts. Further, a multiscale representation is achieved through a filtration process, during which a series of nested simplicial complexes at different scales are generated. Forman Ricci curvatures (FRCs) are calculated on the series of simplicial complexes, and the persistence and variation of FRCs during the filtration process is defined as FPRC. Moreover, persistent attributes, which are FPRC-based functions and properties, are employed as molecular descriptors, and combined with machine learning models, in particular, gradient boosting tree (GBT). Our FPRC-GBT models are extensively trained and tested on three most commonly-used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. It has been found that our results are better than the ones from machine learning models with traditional molecular descriptors.
人工智能(AI)技术已经逐渐应用于整个药物设计过程,从靶点发现、先导化合物发现、先导化合物优化和临床前开发到临床试验的最后三个阶段。目前,基于人工智能的药物设计的核心挑战之一是分子特征化,即识别或设计合适的分子描述符或指纹。高效且可转移的分子描述符是所有基于人工智能的药物设计模型成功的关键。在这里,我们首次提出基于 Forman 持久 Ricci 曲率(FPRC)的分子特征化和特征工程。分子结构和相互作用被建模为单纯复形,这是图的高维对应物的推广。进一步,通过过滤过程实现多尺度表示,在此过程中生成一系列不同尺度的嵌套单纯复形。在一系列单纯复形上计算 Forman Ricci 曲率(FRC),并定义过滤过程中 FRC 的持久和变化为 FPRC。此外,持久属性(基于 FPRC 的函数和属性)被用作分子描述符,并与机器学习模型,特别是梯度提升树(GBT)结合使用。我们的 FPRC-GBT 模型在三个最常用的数据集(包括 PDBbind-2007、PDBbind-2013 和 PDBbind-2016)上进行了广泛的训练和测试。结果表明,我们的结果优于基于传统分子描述符的机器学习模型的结果。