Chen Feng, Zhao Zhigang, Ren Zheng, Lu Kun, Yu Yang, Wang Wenyan
School of Advanced Manufacturing Engineering, Hefei University, Hefei, China.
School of Electrical and Information Engineering, Anhui University of Technology, Ma'anshan, Anhui, China.
PLoS One. 2025 Mar 6;20(3):e0318420. doi: 10.1371/journal.pone.0318420. eCollection 2025.
Drug target interactions (DTIs) play a crucial role in drug discovery and development. The prediction of DTIs based on computational method can effectively assist the experimental techniques for DTIs identification, which are time-consuming and expensive. However, the current computational models suffer from low accuracy and high false positive rate in the prediction of DTIs, especially for datasets with extremely unbalanced sample categories. To accurately identify the interaction between drugs and target proteins, a variety of descriptors that fully show the characteristic information of drugs and targets are extracted and applied to the integrated method random forest (RF) in this work. Here, the random projection method is adopted to reduce the feature dimension such that simplify the model calculation. In addition, to balance the number of samples in different categories, a down sampling method NearMiss (NM) which can control the number of samples is used. Based on the gold standard datasets (nuclear receptors, ion channel, GPCRs and enzymes), the proposed method achieves the auROC of 92.26%, 98.21%, 97.65%, 99.33%, respectively. The experimental results show that the proposed method yields significantly higher performance than that of state-of-the-art methods in predicting drug target interaction.
药物-靶点相互作用(DTIs)在药物研发中起着至关重要的作用。基于计算方法预测DTIs能够有效辅助识别DTIs的实验技术,而这些实验技术既耗时又昂贵。然而,当前的计算模型在预测DTIs时存在准确率低和假阳性率高的问题,尤其是对于样本类别极度不平衡的数据集。为了准确识别药物与靶蛋白之间的相互作用,本研究提取了多种能够充分展现药物和靶点特征信息的描述符,并将其应用于集成方法随机森林(RF)中。在此,采用随机投影方法来降低特征维度,从而简化模型计算。此外,为了平衡不同类别中的样本数量,使用了一种能够控制样本数量的下采样方法NearMiss(NM)。基于金标准数据集(核受体、离子通道、G蛋白偶联受体和酶),所提方法的曲线下面积(auROC)分别达到了92.26%、98.21%、97.65%和99.33%。实验结果表明,在预测药物-靶点相互作用方面,所提方法的性能显著高于现有最先进的方法。