Centre for Molecular Modeling, CSIR-Indian Institute of Chemical Technology, Hyderabad, 500007, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India; Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, Assam, 785 006, India.
Comput Biol Med. 2021 Nov;138:104856. doi: 10.1016/j.compbiomed.2021.104856. Epub 2021 Sep 10.
Machine learning and data-driven approaches are currently being widely used in drug discovery and development due to their potential advantages in decision-making based on the data leveraged from existing sources. Applying these approaches to drug repurposing (DR) studies can identify new relationships between drug molecules, therapeutic targets and diseases that will eventually help in generating new insights for developing novel therapeutics. In the current study, a dataset of 1671 approved drugs is analyzed using a combined approach involving unsupervised Machine Learning (ML) techniques (Principal Component Analysis (PCA) followed by k-means clustering) and Structure-Activity Relationships (SAR) predictions for DR. PCA is applied on all the two dimensional (2D) molecular descriptors of the dataset and the first five Principal Components (PC) were subsequently used to cluster the drugs into nine well separated clusters using k-means algorithm. We further predicted the biological activities for the drug-dataset using the PASS (Predicted Activities Spectra of Substances) tool. These predicted activity values are analyzed systematically to identify repurposable drugs for various diseases. Clustering patterns obtained from k-means showed that every cluster contains subgroups of structurally similar drugs that may or may not have similar therapeutic indications. We hypothesized that such structurally similar but therapeutically different drugs can be repurposed for the native indications of other drugs of the same cluster based on their high predicted biological activities obtained from PASS analysis. In line with this, we identified 66 drugs from the nine clusters which are structurally similar but have different therapeutic uses and can therefore be repurposed for one or more native indications of other drugs of the same cluster. Some of these drugs not only share a common substructure but also bind to the same target and may have a similar mechanism of action, further supporting our hypothesis. Furthermore, based on the analysis of predicted biological activities, we identified 1423 drugs that can be repurposed for 366 new indications against several diseases. In this study, an integrated approach of unsupervised ML and SAR analysis have been used to identify new indications for approved drugs and the study provides novel insights into clustering patterns generated through descriptor level analysis of approved drugs.
机器学习和数据驱动方法由于其在基于现有数据源进行决策方面的潜在优势,目前在药物发现和开发中得到了广泛应用。将这些方法应用于药物重定位 (DR) 研究,可以在药物分子、治疗靶点和疾病之间发现新的关系,最终有助于为开发新的治疗方法提供新的见解。在本研究中,使用一种综合方法分析了包含 1671 种已批准药物的数据集,该方法涉及无监督机器学习 (ML) 技术(主成分分析 (PCA) 后接 k-均值聚类)和结构-活性关系 (SAR) 预测用于 DR。PCA 应用于数据集的所有二维 (2D) 分子描述符,随后使用前五个主成分 (PC) 使用 k-均值算法将药物聚类为九个分离良好的簇。我们进一步使用 PASS(预测物质的活动谱)工具预测药物-数据集的生物活性。系统地分析这些预测的活性值,以确定用于各种疾病的可重定位药物。k-均值聚类模式表明,每个簇都包含结构相似药物的亚组,这些药物可能具有相似的治疗适应症,也可能没有。我们假设,基于 PASS 分析获得的高预测生物活性,可以将具有相似结构但治疗作用不同的药物重新用于同一簇中其他药物的天然适应症。根据这一假设,我们从九个簇中确定了 66 种药物,这些药物结构相似但具有不同的治疗用途,因此可以重新用于同一簇中其他药物的一种或多种天然适应症。其中一些药物不仅共享一个共同的亚结构,而且还与同一靶点结合,可能具有相似的作用机制,进一步支持我们的假设。此外,基于预测生物活性的分析,我们确定了 1423 种药物可重新用于针对 366 种新适应症的多种疾病。在这项研究中,使用了无监督 ML 和 SAR 分析的综合方法来确定已批准药物的新适应症,该研究为通过已批准药物的描述符级分析生成的聚类模式提供了新的见解。