Xie Liangxu, Xu Lei, Kong Ren, Chang Shan, Xu Xiaojun
Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China.
Jiangsu Sino-Israel Industrial Technology Research Institute, Changzhou, China.
Front Pharmacol. 2020 Dec 18;11:606668. doi: 10.3389/fphar.2020.606668. eCollection 2020.
The accurate predicting of physical properties and bioactivity of drug molecules in deep learning depends on how molecules are represented. Many types of molecular descriptors have been developed for quantitative structure-activity/property relationships quantitative structure-activity relationships (QSPR). However, each molecular descriptor is optimized for a specific application with encoding preference. Considering that standalone featurization methods may only cover parts of information of the chemical molecules, we proposed to build the conjoint fingerprint by combining two supplementary fingerprints. The impact of conjoint fingerprint and each standalone fingerprint on predicting performance was systematically evaluated in predicting the logarithm of the partition coefficient (logP) and binding affinity of protein-ligand by using machine learning/deep learning (ML/DL) methods, including random forest (RF), support vector regression (SVR), extreme gradient boosting (XGBoost), long short-term memory network (LSTM), and deep neural network (DNN). The results demonstrated that the conjoint fingerprint yielded improved predictive performance, even outperforming the consensus model using two standalone fingerprints among four out of five examined methods. Given that the conjoint fingerprint scheme shows easy extensibility and high applicability, we expect that the proposed conjoint scheme would create new opportunities for continuously improving predictive performance of deep learning by harnessing the complementarity of various types of fingerprints.
深度学习中药物分子物理性质和生物活性的准确预测取决于分子的表示方式。为了定量结构-活性/性质关系(QSPR),已经开发了多种类型的分子描述符。然而,每个分子描述符都是针对具有编码偏好的特定应用进行优化的。考虑到单独的特征化方法可能只涵盖化学分子信息的一部分,我们建议通过组合两个互补指纹来构建联合指纹。通过使用机器学习/深度学习(ML/DL)方法,包括随机森林(RF)、支持向量回归(SVR)、极端梯度提升(XGBoost)、长短期记忆网络(LSTM)和深度神经网络(DNN),系统地评估了联合指纹和每个单独指纹对预测分配系数对数(logP)和蛋白质-配体结合亲和力的预测性能的影响。结果表明,联合指纹产生了更好的预测性能,甚至在五种测试方法中的四种方法中优于使用两个单独指纹的共识模型。鉴于联合指纹方案显示出易于扩展性和高适用性,我们期望所提出的联合方案将通过利用各种类型指纹之间的互补性为不断提高深度学习的预测性能创造新机会。