Ding Yulian, Chen Bolin, Lei Xiujuan, Liao Bo, Wu Fang-Xiang
Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 1L5, Canada.
School of Computer Science and Technology, Northwestern Polytechnical University, Xi'an 710072, China.
Comput Biol Chem. 2020 May 19;87:107287. doi: 10.1016/j.compbiolchem.2020.107287.
Circular RNAs (circRNAs), a large group of small endogenous noncoding RNA molecules, have been proved to modulate protein-coding genes in the human genome. In recent years, many experimental studies have demonstrated that circRNAs are dysregulated in a number of diseases, and they can serve as biomarkers for disease diagnosis and prognosis. However, it is expensive and time-consuming to identify circRNA-disease associations by biological experiments and few computational models have been proposed for novel circRNA-disease association prediction. In this study, we develop a computational model based on the random walk and the logistic regression (RWLR) to predict circRNA-disease associations. Firstly, a circRNA-circRNA similarity network is constructed by calculating their functional similarity of circRNA based on circRNA-related gene ontology. Then, a random walk with restart is implemented on the circRNA similarity network, and the features of each pair of circRNA-disease are extracted based on the results of the random walk and the circRNA-disease association matrix. Finally, a logistic regression model is used to predict novel circRNA-disease associations. Leave one out validation (LOOCV), five-fold cross validation (5CV) and ten-fold cross validation (10CV) are adopted to evaluate the prediction performance of RWLR, by comparing with the latest two methods PWCDA and DWNN-RLS. The experiment results show that our RWLR has higher AUC values of LOOCV, 5CV and 10CV than the other two latest methods, which demonstrates that RWLR has a better performance than other computational methods. What's more, case studies also illustrate the reliability and effectiveness of RWLR for circRNA-disease association prediction.
环状RNA(circRNAs)是一大类内源性小分子非编码RNA分子,已被证明可调控人类基因组中的蛋白质编码基因。近年来,许多实验研究表明,circRNAs在多种疾病中表达失调,可作为疾病诊断和预后的生物标志物。然而,通过生物学实验鉴定circRNA与疾病的关联既昂贵又耗时,并且很少有计算模型被提出用于预测新的circRNA与疾病的关联。在本研究中,我们开发了一种基于随机游走随机游走与逻辑回归(RWLR)的计算模型来预测circRNA与疾病的关联。首先,基于circRNA相关的基因本体计算circRNA的功能相似性,构建circRNA - circRNA相似性网络。然后,在circRNA相似性网络上进行带重启的随机游走,并根据随机游走结果和circRNA - 疾病关联矩阵提取每对circRNA - 疾病的特征。最后,使用逻辑回归模型预测新的circRNA与疾病的关联。通过与最新的两种方法PWCDA和DWNN - RLS比较,采用留一法验证(LOOCV)、五折交叉验证(5CV)和十折交叉验证(10CV)来评估RWLR的预测性能。实验结果表明,我们的RWLR在LOOCV、5CV和10CV中的AUC值高于其他两种最新方法,这表明RWLR比其他计算方法具有更好的性能。此外,案例研究也说明了RWLR在circRNA与疾病关联预测中的可靠性和有效性。