Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, 830011, China.
University of Chinese Academy of Sciences, Beijing, 100049, China.
Sci Rep. 2020 Mar 18;10(1):4972. doi: 10.1038/s41598-020-61616-9.
Drug-disease association is an important piece of information which participates in all stages of drug repositioning. Although the number of drug-disease associations identified by high-throughput technologies is increasing, the experimental methods are time consuming and expensive. As supplement to them, many computational methods have been developed for an accurate in silico prediction for new drug-disease associations. In this work, we present a novel computational model combining sparse auto-encoder and rotation forest (SAEROF) to predict drug-disease association. Gaussian interaction profile kernel similarity, drug structure similarity and disease semantic similarity were extracted for exploring the association among drugs and diseases. On this basis, a rotation forest classifier based on sparse auto-encoder is proposed to predict the association between drugs and diseases. In order to evaluate the performance of the proposed model, we used it to implement 10-fold cross validation on two golden standard datasets, Fdataset and Cdataset. As a result, the proposed model achieved AUCs (Area Under the ROC Curve) of Fdataset and Cdataset are 0.9092 and 0.9323, respectively. For performance evaluation, we compared SAEROF with the state-of-the-art support vector machine (SVM) classifier and some existing computational models. Three human diseases (Obesity, Stomach Neoplasms and Lung Neoplasms) were explored in case studies. As a result, more than half of the top 20 drugs predicted were successfully confirmed by the Comparative Toxicogenomics Database(CTD database). This model is a feasible and effective method to predict drug-disease correlation, and its performance is significantly improved compared with existing methods.
药物-疾病关联是药物重定位各个阶段的重要信息。尽管高通量技术已经识别出越来越多的药物-疾病关联,但实验方法耗时且昂贵。作为补充,已经开发出许多计算方法来准确地进行新的药物-疾病关联的计算预测。在这项工作中,我们提出了一种新颖的计算模型,该模型结合了稀疏自动编码器和旋转森林(SAEROF),用于预测药物-疾病关联。提取了高斯互作用轮廓核相似度、药物结构相似度和疾病语义相似度,以探索药物与疾病之间的关联。在此基础上,提出了一种基于稀疏自动编码器的旋转森林分类器来预测药物与疾病之间的关联。为了评估所提出模型的性能,我们使用它在两个黄金标准数据集 Fdataset 和 Cdataset 上进行了 10 折交叉验证。结果表明,所提出的模型在 Fdataset 和 Cdataset 上的 AUC(ROC 曲线下的面积)分别为 0.9092 和 0.9323。为了进行性能评估,我们将 SAEROF 与最先进的支持向量机(SVM)分类器和一些现有的计算模型进行了比较。在案例研究中探索了三种人类疾病(肥胖症、胃肿瘤和肺肿瘤)。结果表明,预测的前 20 种药物中有一半以上被比较毒理学基因组数据库(CTD 数据库)成功证实。该模型是一种可行且有效的预测药物-疾病相关性的方法,与现有方法相比,其性能有了显著提高。