The School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, 264209, China.
Comput Biol Med. 2022 Dec;151(Pt A):106289. doi: 10.1016/j.compbiomed.2022.106289. Epub 2022 Nov 11.
As a non-coding RNA molecule with closed-loop structure, circular RNA (circRNA) is tissue-specific and cell-specific in expression pattern. It regulates disease development by modulating the expression of disease-related genes. Therefore, exploring the circRNA-disease relationship can reveal the molecular mechanism of disease pathogenesis. Biological experiments for detecting circRNA-disease associations are time-consuming and laborious. Constrained by the sparsity of known circRNA-disease associations, existing algorithms cannot obtain relatively complete structural information to represent features accurately. To this end, this paper proposes a new predictor, VGAERF, combining Variational Graph Auto-Encoder (VGAE) and Random Forest (RF). Firstly, circRNA homogeneous graph structure and disease homogeneous graph structure are constructed by Gaussian interaction profile (GIP) kernel similarity, semantic similarity, and known circRNA-disease associations. VGAEs with the same structure are employed to extract the higher-order features by the encoding and decoding of input graph structures. To further increase the completeness of the network structure information, the deep features acquired from the two VGAEs are summed, and then train the RF with sparse data processing capability to perform the prediction task. On the independent test set, the Area Under ROC Curve (AUC), accuracy, and Area Under PR Curve (AUPR) of the proposed method reach up to 0.9803, 0.9345, and 0.9894, respectively. On the same dataset, the AUC, accuracy, and AUPR of VGAERF are 2.09%, 5.93%, and 1.86% higher than the best-performing method (AEDNN). It is anticipated that VGAERF will provide significant information to decipher the molecular mechanisms of circRNA-disease associations, and promote the diagnosis of circRNA-related diseases.
作为一种具有闭环结构的非编码 RNA 分子,环状 RNA(circRNA)在表达模式上具有组织特异性和细胞特异性。它通过调节与疾病相关基因的表达来调节疾病的发展。因此,探索 circRNA-疾病的关系可以揭示疾病发病机制的分子机制。检测 circRNA-疾病关联的生物学实验既耗时又费力。受已知 circRNA-疾病关联稀疏性的限制,现有的算法无法获得相对完整的结构信息来准确表示特征。为此,本文提出了一种新的预测器 VGAERF,它结合了变分图自动编码器(VGAE)和随机森林(RF)。首先,通过高斯互作用谱(GIP)核相似性、语义相似性和已知的 circRNA-疾病关联,构建 circRNA 同构图结构和疾病同构图结构。使用具有相同结构的 VGAEs 通过输入图结构的编码和解码来提取高阶特征。为了进一步增加网络结构信息的完整性,将从两个 VGAEs 中获取的深度特征进行求和,然后使用具有稀疏数据处理能力的 RF 进行预测任务。在独立测试集上,所提出方法的 ROC 曲线下面积(AUC)、准确性和 PR 曲线下面积(AUPR)分别达到 0.9803、0.9345 和 0.9894。在同一数据集上,VGAERF 的 AUC、准确性和 AUPR 比表现最好的方法(AEDNN)分别高 2.09%、5.93%和 1.86%。预计 VGAERF 将为破译 circRNA-疾病关联的分子机制提供重要信息,并促进 circRNA 相关疾病的诊断。