College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China.
Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.
PLoS Comput Biol. 2020 May 20;16(5):e1007568. doi: 10.1371/journal.pcbi.1007568. eCollection 2020 May.
Numerous evidences indicate that Circular RNAs (circRNAs) are widely involved in the occurrence and development of diseases. Identifying the association between circRNAs and diseases plays a crucial role in exploring the pathogenesis of complex diseases and improving the diagnosis and treatment of diseases. However, due to the complex mechanisms between circRNAs and diseases, it is expensive and time-consuming to discover the new circRNA-disease associations by biological experiment. Therefore, there is increasingly urgent need for utilizing the computational methods to predict novel circRNA-disease associations. In this study, we propose a computational method called GCNCDA based on the deep learning Fast learning with Graph Convolutional Networks (FastGCN) algorithm to predict the potential disease-associated circRNAs. Specifically, the method first forms the unified descriptor by fusing disease semantic similarity information, disease and circRNA Gaussian Interaction Profile (GIP) kernel similarity information based on known circRNA-disease associations. The FastGCN algorithm is then used to objectively extract the high-level features contained in the fusion descriptor. Finally, the new circRNA-disease associations are accurately predicted by the Forest by Penalizing Attributes (Forest PA) classifier. The 5-fold cross-validation experiment of GCNCDA achieved 91.2% accuracy with 92.78% sensitivity at the AUC of 90.90% on circR2Disease benchmark dataset. In comparison with different classifier models, feature extraction models and other state-of-the-art methods, GCNCDA shows strong competitiveness. Furthermore, we conducted case study experiments on diseases including breast cancer, glioma and colorectal cancer. The results showed that 16, 15 and 17 of the top 20 candidate circRNAs with the highest prediction scores were respectively confirmed by relevant literature and databases. These results suggest that GCNCDA can effectively predict potential circRNA-disease associations and provide highly credible candidates for biological experiments.
大量证据表明,环状 RNA(circRNA)广泛参与疾病的发生和发展。鉴定 circRNA 与疾病的关联对于探索复杂疾病的发病机制以及提高疾病的诊断和治疗水平具有重要意义。然而,由于 circRNA 与疾病之间的复杂机制,通过生物实验发现新的 circRNA-疾病关联既昂贵又耗时。因此,迫切需要利用计算方法来预测新的 circRNA-疾病关联。在本研究中,我们提出了一种基于深度学习快速图卷积网络(FastGCN)算法的计算方法 GCNCDA,用于预测潜在的疾病相关环状 RNA。具体来说,该方法首先基于已知的 circRNA-疾病关联,融合疾病语义相似性信息、疾病和 circRNA 高斯互作用谱(GIP)核相似性信息,形成统一的描述符。然后,使用 FastGCN 算法客观地提取融合描述符中包含的高级特征。最后,通过森林惩罚属性(Forest PA)分类器准确预测新的 circRNA-疾病关联。在 circR2Disease 基准数据集上,GCNCDA 的 5 折交叉验证实验达到了 91.2%的准确率,AUC 为 90.90%时灵敏度为 92.78%。与不同的分类器模型、特征提取模型和其他最先进的方法相比,GCNCDA 具有很强的竞争力。此外,我们还对乳腺癌、神经胶质瘤和结直肠癌等疾病进行了案例研究实验。结果表明,在预测得分最高的前 20 个候选 circRNA 中,有 16、15 和 17 个分别被相关文献和数据库证实。这些结果表明,GCNCDA 可以有效地预测潜在的 circRNA-疾病关联,并为生物实验提供高度可信的候选物。