a School of Information and Control Engineering , China University of Mining and Technology , Xuzhou , China.
RNA Biol. 2019 Mar;16(3):257-269. doi: 10.1080/15476286.2019.1568820. Epub 2019 Jan 28.
MicroRNAs (miRNAs) play an important role in prevention, diagnosis and treatment of human complex diseases. Predicting potential miRNA-disease associations could provide important prior information for medical researchers. Therefore, reliable computational models are expected to be an effective supplement for inferring associations between miRNAs and diseases. In this study, we developed a novel calculative model named Negative Samples Extraction based MiRNA-Disease Association prediction (NSEMDA). NSEMDA filtered reliable negative samples by two positive-unlabeled learning models, namely, the Spy and Rocchio techniques and calculated similarity weights for ambiguous samples. The positive samples, reliable negative samples and ambiguous samples with similarity weights were used to construct a Support Vector Machine-Similarity Weight model to predict miRNA-disease associations. NSEMDA improved the credibility of negative samples and reduced the impact of noise samples by introducing ambiguous samples with similarity weights to train prediction model. As a result, NSEMDA achieved the AUC of 0.8899 in global leave-one-out cross validation (LOOCV) and AUC of 0.8353 under local LOOCV. In 100 times 5-fold cross validation, NSEMDA obtained an average AUC of 0.8878 and standard deviation of 0.0014. These AUCs are higher than many classical models. Besides, we also carried out three kinds of case studies to evaluate the performance of NSEMDA. Among the top 50 potential related miRNAs of esophageal neoplasms, lung neoplasms and carcinoma hepatocellular predicted by NSEMDA, 46, 50 and 45 miRNAs were verified to be associated with the investigated disease by experimental evidences, respectively. Therefore, NSEMDA would be a reliable calculative model for inferring miRNA-disease associations.
微小 RNA(miRNAs)在人类复杂疾病的预防、诊断和治疗中发挥着重要作用。预测潜在的 miRNA-疾病关联可以为医学研究人员提供重要的先验信息。因此,可靠的计算模型有望成为推断 miRNA 和疾病之间关联的有效补充。在这项研究中,我们开发了一种名为基于负样本提取的 miRNA-疾病关联预测(NSEMDA)的新计算模型。NSEMDA 通过两种正-无标签学习模型,即 Spy 和 Rocchio 技术,过滤可靠的负样本,并计算模糊样本的相似性权重。正样本、可靠的负样本和具有相似性权重的模糊样本被用于构建支持向量机-相似性权重模型来预测 miRNA-疾病关联。NSEMDA 通过引入具有相似性权重的模糊样本来训练预测模型,提高了负样本的可信度,降低了噪声样本的影响。结果,NSEMDA 在全局留一法交叉验证(LOOCV)中获得了 0.8899 的 AUC,在局部 LOOCV 中获得了 0.8353 的 AUC。在 100 次 5 折交叉验证中,NSEMDA 获得了 0.8878 的平均 AUC 和 0.0014 的标准差。这些 AUC 高于许多经典模型。此外,我们还进行了三种案例研究来评估 NSEMDA 的性能。在 NSEMDA 预测的食管癌、肺癌和肝癌的前 50 个潜在相关 miRNA 中,分别有 46、50 和 45 个 miRNA 通过实验证据验证与所研究的疾病相关。因此,NSEMDA 将是推断 miRNA-疾病关联的可靠计算模型。