Program in Bioinformatics and Computational Biology, 214088Graduate School, Chulalongkorn University, Bangkok, Thailand.
Advanced Virtual and Intelligent Computing (AVIC) center, Department of Mathematics and Computer Science, 133942Faculty of Science, Chulalongkorn University, Bangkok, Thailand.
Sci Prog. 2022 Jul-Sep;105(3):368504221109215. doi: 10.1177/00368504221109215.
Identifying new therapeutic indications for existing drugs is a major challenge in drug repositioning. Most computational drug repositioning methods focus on known targets. Analyzing multiple aspects of various protein associations provides an opportunity to discover underlying drug-associated proteins that can be used to improve the performance of the drug repositioning approaches. In this study, machine learning models were developed based on the similarities of diversified biological features, including protein interaction, topological network, sequence alignment, and biological function to predict protein pairs associating with the same drugs. The crucial set of features was identified, and the high performances of protein pair predictions were achieved with an area under the curve (AUC) value of more than 93%. Based on drug chemical structures, the drug similarity levels of the promising protein pairs were used to quantify the inferred drug-associated proteins. Furthermore, these proteins were employed to establish an augmented drug-protein matrix to enhance the efficiency of three existing drug repositioning techniques: a similarity constrained matrix factorization for the drug-disease associations (SCMFDD), an ensemble meta-paths and singular value decomposition (EMP-SVD) model, and a topology similarity and singular value decomposition (TS-SVD) technique. The results showed that the augmented matrix helped to improve the performance up to 4% more in comparison to the original matrix for SCMFDD and EMP-SVD, and about 1% more for TS-SVD. In summary, inferring new protein pairs related to the same drugs increase the opportunity to reveal missing drug-associated proteins that are important for drug development via the drug repositioning technique.
鉴定现有药物的新治疗用途是药物重定位的主要挑战。大多数计算药物重定位方法都集中在已知靶点上。分析各种蛋白质关联的多个方面为发现潜在的与药物相关的蛋白质提供了机会,这些蛋白质可用于改进药物重定位方法的性能。在这项研究中,基于多样化的生物学特征(包括蛋白质相互作用、拓扑网络、序列比对和生物学功能)的相似性,开发了机器学习模型,以预测与相同药物相关的蛋白质对。确定了关键特征集,并通过达到超过 93%的曲线下面积(AUC)值实现了蛋白质对预测的高性能。基于药物化学结构,使用有前途的蛋白质对的药物相似性水平来量化推断的与药物相关的蛋白质。此外,这些蛋白质被用于建立增强的药物-蛋白质矩阵,以提高三种现有药物重定位技术的效率:用于药物-疾病关联的相似性约束矩阵分解(SCMFDD)、集成元路径和奇异值分解(EMP-SVD)模型以及拓扑相似性和奇异值分解(TS-SVD)技术。结果表明,与原始矩阵相比,增强矩阵有助于提高 SCMFDD 和 EMP-SVD 的性能,最多提高 4%,而对于 TS-SVD,则提高约 1%。总之,推断与相同药物相关的新蛋白质对增加了通过药物重定位技术揭示对药物开发很重要的缺失与药物相关的蛋白质的机会。