Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
University of Chinese Academy of Sciences, Beijing 100049, China.
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac083.
While the technologies of ribonucleic acid-sequence (RNA-seq) and transcript assembly analysis have continued to improve, a novel topology of RNA transcript was uncovered in the last decade and is called circular RNA (circRNA). Recently, researchers have revealed that they compete with messenger RNA (mRNA) and long noncoding for combining with microRNA in gene regulation. Therefore, circRNA was assumed to be associated with complex disease and discovering the relationship between them would contribute to medical research. However, the work of identifying the association between circRNA and disease in vitro takes a long time and usually without direction. During these years, more and more associations were verified by experiments. Hence, we proposed a computational method named identifying circRNA-disease association based on graph representation learning (iGRLCDA) for the prediction of the potential association of circRNA and disease, which utilized a deep learning model of graph convolution network (GCN) and graph factorization (GF). In detail, iGRLCDA first derived the hidden feature of known associations between circRNA and disease using the Gaussian interaction profile (GIP) kernel combined with disease semantic information to form a numeric descriptor. After that, it further used the deep learning model of GCN and GF to extract hidden features from the descriptor. Finally, the random forest classifier is introduced to identify the potential circRNA-disease association. The five-fold cross-validation of iGRLCDA shows strong competitiveness in comparison with other excellent prediction models at the gold standard data and achieved an average area under the receiver operating characteristic curve of 0.9289 and an area under the precision-recall curve of 0.9377. On reviewing the prediction results from the relevant literature, 22 of the top 30 predicted circRNA-disease associations were noted in recent published papers. These exceptional results make us believe that iGRLCDA can provide reliable circRNA-disease associations for medical research and reduce the blindness of wet-lab experiments.
虽然核糖核酸序列(RNA-seq)和转录本组装分析技术不断改进,但在过去十年中发现了一种新的 RNA 转录本拓扑结构,称为环状 RNA(circRNA)。最近,研究人员揭示了它们与信使 RNA(mRNA)和长非编码 RNA 竞争,以结合基因调控中的 microRNA。因此,circRNA 被认为与复杂疾病有关,发现它们之间的关系将有助于医学研究。然而,体外鉴定 circRNA 与疾病之间的关联需要很长时间,而且通常没有方向。在这些年中,越来越多的关联通过实验得到了验证。因此,我们提出了一种基于图表示学习的识别 circRNA-疾病关联的计算方法(iGRLCDA),用于预测 circRNA 和疾病的潜在关联,该方法利用图卷积网络(GCN)和图分解(GF)的深度学习模型。具体来说,iGRLCDA 首先使用高斯互作用轮廓(GIP)核结合疾病语义信息,从已知 circRNA 和疾病关联中提取隐藏特征,形成数字描述符。然后,它进一步利用 GCN 和 GF 的深度学习模型从描述符中提取隐藏特征。最后,引入随机森林分类器来识别潜在的 circRNA-疾病关联。在黄金标准数据中,iGRLCDA 的五重交叉验证与其他优秀预测模型相比具有很强的竞争力,在接收者操作特征曲线下的平均面积为 0.9289,在精度-召回曲线下的面积为 0.9377。在审查相关文献的预测结果时,在最近发表的论文中注意到了前 30 个预测的 circRNA-疾病关联中的 22 个。这些出色的结果使我们相信,iGRLCDA 可以为医学研究提供可靠的 circRNA-疾病关联,并减少湿实验室实验的盲目性。