Li Zhanchao, Li Xiaoyu, Tang Xiuli, Wang Yan
School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China.
Mol Divers. 2025 Aug 27. doi: 10.1007/s11030-025-11337-w.
The identification of relationships between drugs and proteins not only helps in the study of pathological mechanisms but also in drug repositioning studies. However, conventional wet-lab methods are often plagued by issues such as being time-consuming, labour-intensive, and characterized by low accuracy. Therefore, the development of a theoretical computational method is imperative for the expeditious and precise identification of drug-protein relationships. In this study, a self-attention-based multi-source and cascade framework (AMCF-RDP) is developed to identify the drug-protein relationships. Embedded features and network topology features derived from the knowledge graph and complex network were employed to characterize the drug-protein relationships. A two-layer model was constructed using attention mechanism and fully connected layers and was used to predict whether a drug interacts with a protein and what type of interaction it is. The efficacy of the proposed method was evaluated and confirmed based on the non-redundant datasets, ablation experiments, and comparisons with machine learning algorithms and other state-of-the-art methods. Results from fivefold cross-validation demonstrate that the developed method can quickly and accurately recognize drug-protein interactions with an accuracy of 90.21%, a sensitivity of 90.35%, and a Matthews correlation coefficient of 0.8043. Furthermore, it can also distinguish the types of drug-protein interaction, achieving a macro-recall of 93.43% and a macro-F1 score of 0.9381. Compared to the methods described in the literature, the proposed method achieved an area under the receiver operating characteristic curve of 0.9176, representing an improvement of 0.4746. A total of 100,000 drug-protein associations were identified, some of which were confirmed through molecular docking, KEGG, and gene ontology analyses. The AMCF-RDP has been demonstrated to significantly improve the identification of drug-protein relationships. It is anticipated that this will serve as a valuable tool in the domains of drug development and the investigation of mechanisms of action.
药物与蛋白质之间关系的识别不仅有助于病理机制的研究,也有助于药物重新定位研究。然而,传统的湿实验室方法常常受到诸如耗时、劳动强度大以及准确性低等问题的困扰。因此,开发一种理论计算方法对于快速、精确地识别药物 - 蛋白质关系至关重要。在本研究中,开发了一种基于自注意力的多源级联框架(AMCF - RDP)来识别药物 - 蛋白质关系。利用从知识图谱和复杂网络中提取的嵌入特征和网络拓扑特征来表征药物 - 蛋白质关系。使用注意力机制和全连接层构建了一个两层模型,用于预测药物是否与蛋白质相互作用以及相互作用的类型。基于非冗余数据集、消融实验以及与机器学习算法和其他先进方法的比较,对所提出方法的有效性进行了评估和验证。五重交叉验证的结果表明,所开发的方法能够快速、准确地识别药物 - 蛋白质相互作用,准确率为90.21%,灵敏度为90.35%,马修斯相关系数为0.8043。此外,它还能够区分药物 - 蛋白质相互作用的类型,实现了93.43%的宏召回率和0.9381的宏F1分数。与文献中描述的方法相比,所提出的方法在受试者工作特征曲线下的面积为0.9176,提高了0.4746。共识别出100,000个药物 - 蛋白质关联,其中一些通过分子对接、KEGG和基因本体分析得到了证实。AMCF - RDP已被证明能显著提高药物 - 蛋白质关系的识别能力。预计这将成为药物开发和作用机制研究领域的一个有价值的工具。