King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia.
The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA.
Bioinformatics. 2018 Apr 1;34(7):1164-1173. doi: 10.1093/bioinformatics/btx731.
Finding computationally drug-target interactions (DTIs) is a convenient strategy to identify new DTIs at low cost with reasonable accuracy. However, the current DTI prediction methods suffer the high false positive prediction rate.
We developed DDR, a novel method that improves the DTI prediction accuracy. DDR is based on the use of a heterogeneous graph that contains known DTIs with multiple similarities between drugs and multiple similarities between target proteins. DDR applies non-linear similarity fusion method to combine different similarities. Before fusion, DDR performs a pre-processing step where a subset of similarities is selected in a heuristic process to obtain an optimized combination of similarities. Then, DDR applies a random forest model using different graph-based features extracted from the DTI heterogeneous graph. Using 5-repeats of 10-fold cross-validation, three testing setups, and the weighted average of area under the precision-recall curve (AUPR) scores, we show that DDR significantly reduces the AUPR score error relative to the next best start-of-the-art method for predicting DTIs by 34% when the drugs are new, by 23% when targets are new and by 34% when the drugs and the targets are known but not all DTIs between them are not known. Using independent sources of evidence, we verify as correct 22 out of the top 25 DDR novel predictions. This suggests that DDR can be used as an efficient method to identify correct DTIs.
The data and code are provided at https://bitbucket.org/RSO24/ddr/.
Supplementary data are available at Bioinformatics online.
发现计算药物-靶标相互作用(DTIs)是一种以合理的准确度和较低成本识别新 DTI 的便捷策略。然而,目前的 DTI 预测方法存在较高的假阳性预测率。
我们开发了 DDR,这是一种提高 DTI 预测准确性的新方法。DDR 基于使用包含已知 DTI 的异构图,其中药物之间和目标蛋白之间存在多种相似性。DDR 应用非线性相似性融合方法来组合不同的相似性。在融合之前,DDR 执行一个预处理步骤,其中通过启发式过程选择相似性的子集,以获得相似性的优化组合。然后,DDR 使用从 DTI 异构图中提取的不同基于图的特征应用随机森林模型。通过 5 次 10 折交叉验证、3 种测试设置和精度-召回曲线下面积(AUPR)得分的加权平均值,我们表明当药物是新的时,DDR 显著降低了 AUPR 得分误差,相对下一个最佳的 DTI 预测方法的误差减少了 34%;当目标是新的时,误差减少了 23%;当药物和目标是已知的,但它们之间并非所有的 DTI 都不知道时,误差减少了 34%。使用独立的证据来源,我们验证了 DDR 前 25 个新预测中有 22 个是正确的。这表明 DDR 可以用作识别正确 DTI 的有效方法。
数据和代码可在 https://bitbucket.org/RSO24/ddr/ 获得。
补充数据可在 Bioinformatics 在线获得。