School of Life Sciences and Biotechnology, Shanghai Jiao Tong University.
School of Medicine, Jiangnan University, Wuxi, China.
Brief Bioinform. 2021 Jan 18;22(1):451-462. doi: 10.1093/bib/bbz152.
Drug-target interactions (DTIs) play a crucial role in target-based drug discovery and development. Computational prediction of DTIs can effectively complement experimental wet-lab techniques for the identification of DTIs, which are typically time- and resource-consuming. However, the performances of the current DTI prediction approaches suffer from a problem of low precision and high false-positive rate. In this study, we aim to develop a novel DTI prediction method for improving the prediction performance based on a cascade deep forest (CDF) model, named DTI-CDF, with multiple similarity-based features between drugs and the similarity-based features between target proteins extracted from the heterogeneous graph, which contains known DTIs. In the experiments, we built five replicates of 10-fold cross-validation under three different experimental settings of data sets, namely, corresponding DTI values of certain drugs (SD), targets (ST), or drug-target pairs (SP) in the training sets are missed but existed in the test sets. The experimental results demonstrate that our proposed approach DTI-CDF achieves a significantly higher performance than that of the traditional ensemble learning-based methods such as random forest and XGBoost, deep neural network, and the state-of-the-art methods such as DDR. Furthermore, there are 1352 newly predicted DTIs which are proved to be correct by KEGG and DrugBank databases. The data sets and source code are freely available at https://github.com//a96123155/DTI-CDF.
药物-靶点相互作用(DTIs)在基于靶点的药物发现和开发中起着至关重要的作用。计算预测 DTI 可以有效地补充实验湿实验室技术,用于识别 DTI,这通常是耗时和资源密集型的。然而,目前 DTI 预测方法的性能存在精度低和假阳性率高的问题。在这项研究中,我们旨在开发一种新的 DTI 预测方法,以提高预测性能,该方法基于级联深度森林(CDF)模型,名为 DTI-CDF,使用来自包含已知 DTI 的异构图的药物之间和目标蛋白之间的多个基于相似性的特征。在实验中,我们在三个不同的数据集实验设置下构建了五组 10 倍交叉验证,即训练集中的某些药物(SD)、靶点(ST)或药物-靶点对(SP)的对应 DTI 值丢失但存在于测试集中。实验结果表明,我们提出的方法 DTI-CDF 比传统的基于集成学习的方法(如随机森林和 XGBoost、深度神经网络)以及最先进的方法(如 DDR)具有更高的性能。此外,有 1352 个新预测的 DTI 通过 KEGG 和 DrugBank 数据库被证明是正确的。数据集和源代码可在 https://github.com//a96123155/DTI-CDF 上免费获取。