Chen Xing, Wang Chun-Chun, Yin Jun, You Zhu-Hong
School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.
School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.
Mol Ther Nucleic Acids. 2018 Dec 7;13:568-579. doi: 10.1016/j.omtn.2018.10.005. Epub 2018 Oct 11.
Since the first microRNA (miRNA) was discovered, a lot of studies have confirmed the associations between miRNAs and human complex diseases. Besides, obtaining and taking advantage of association information between miRNAs and diseases play an increasingly important role in improving the treatment level for complex diseases. However, due to the high cost of traditional experimental methods, many researchers have proposed different computational methods to predict potential associations between miRNAs and diseases. In this work, we developed a computational model of Random Forest for miRNA-disease association (RFMDA) prediction based on machine learning. The training sample set for RFMDA was constructed according to the human microRNA disease database (HMDD) version (v.)2.0, and the feature vectors to represent miRNA-disease samples were defined by integrating miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity. The Random Forest algorithm was first employed to infer miRNA-disease associations. In addition, a filter-based method was implemented to select robust features from the miRNA-disease feature set, which could efficiently distinguish related miRNA-disease pairs from unrelated miRNA-disease pairs. RFMDA achieved areas under the curve (AUCs) of 0.8891, 0.8323, and 0.8818 ± 0.0014 under global leave-one-out cross-validation, local leave-one-out cross-validation, and 5-fold cross-validation, respectively, which were higher than many previous computational models. To further evaluate the accuracy of RFMDA, we carried out three types of case studies for four human complex diseases. As a result, 43 (esophageal neoplasms), 46 (lymphoma), 47 (lung neoplasms), and 48 (breast neoplasms) of the top 50 predicted disease-related miRNAs were verified by experiments in different kinds of case studies. The results of cross-validation and case studies indicated that RFMDA is a reliable model for predicting miRNA-disease associations.
自首个微小RNA(miRNA)被发现以来,大量研究证实了miRNA与人类复杂疾病之间的关联。此外,获取并利用miRNA与疾病之间的关联信息在提高复杂疾病的治疗水平方面发挥着越来越重要的作用。然而,由于传统实验方法成本高昂,许多研究人员提出了不同的计算方法来预测miRNA与疾病之间的潜在关联。在这项工作中,我们基于机器学习开发了一种用于miRNA-疾病关联(RFMDA)预测的随机森林计算模型。RFMDA的训练样本集是根据人类微小RNA疾病数据库(HMDD)版本2.0构建的,并且通过整合miRNA功能相似性、疾病语义相似性和高斯相互作用轮廓核相似性来定义表示miRNA-疾病样本的特征向量。首先采用随机森林算法来推断miRNA-疾病关联。此外,实施了一种基于过滤的方法从miRNA-疾病特征集中选择稳健特征,该方法可以有效地将相关的miRNA-疾病对与不相关的miRNA-疾病对区分开来。RFMDA在全局留一法交叉验证、局部留一法交叉验证和五折交叉验证下分别取得了0.8891、0.8323和0.8818±0.0014的曲线下面积(AUC),高于许多先前的计算模型。为了进一步评估RFMDA的准确性,我们针对四种人类复杂疾病进行了三种类型的案例研究。结果,在不同类型的案例研究中,前50个预测的疾病相关miRNA中有43个(食管肿瘤)、46个(淋巴瘤)、47个(肺肿瘤)和 48个(乳腺肿瘤)通过实验得到了验证。交叉验证和案例研究的结果表明,RFMDA是一种预测miRNA-疾病关联的可靠模型。