School of Computer Science and Engineering, Guilin University of Technology, Guilin, 541004, China.
Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, 541004, China.
BMC Med Inform Decis Mak. 2024 Jun 6;24(1):159. doi: 10.1186/s12911-024-02564-6.
BACKGROUND: Compared with the time-consuming and labor-intensive for biological validation in vitro or in vivo, the computational models can provide high-quality and purposeful candidates in an instant. Existing computational models face limitations in effectively utilizing sparse local structural information for accurate predictions in circRNA-disease associations. This study addresses this challenge with a proposed method, CDA-DGRL (Prediction of CircRNA-Disease Association based on Double-line Graph Representation Learning), which employs a deep learning framework leveraging graph networks and a dual-line representation model integrating graph node features. METHOD: CDA-DGRL comprises several key steps: initially, the integration of diverse biological information to compute integrated similarities among circRNAs and diseases, leading to the construction of a heterogeneous network specific to circRNA-disease associations. Subsequently, circRNA and disease node features are derived using sparse autoencoders. Thirdly, a graph convolutional neural network is employed to capture the local graph network structure by inputting the circRNA-disease heterogeneous network alongside node features. Fourthly, the utilization of node2vec facilitates depth-first sampling of the circRNA-disease heterogeneous network to grasp the global graph network structure, addressing issues associated with sparse raw data. Finally, the fusion of local and global graph network structures is inputted into an extra trees classifier to identify potential circRNA-disease associations. RESULTS: The results, obtained through a rigorous five-fold cross-validation on the circR2Disease dataset, demonstrate the superiority of CDA-DGRL with an AUC value of 0.9866 and an AUPR value of 0.9897 compared to existing state-of-the-art models. Notably, the hyper-random tree classifier employed in this model outperforms other machine learning classifiers. CONCLUSION: Thus, CDA-DGRL stands as a promising methodology for reliably identifying circRNA-disease associations, offering potential avenues to alleviate the necessity for extensive traditional biological experiments. The source code and data for this study are available at https://github.com/zywait/CDA-DGRL .
背景:与体外或体内耗时耗力的生物学验证相比,计算模型可以即时提供高质量和有针对性的候选物。现有的计算模型在有效利用稀疏的局部结构信息以进行circRNA-疾病关联的准确预测方面面临着限制。本研究通过提出一种名为 CDA-DGRL(基于双线性图表示学习的 circRNA-疾病关联预测)的方法来解决这一挑战,该方法利用图网络和集成图节点特征的双线表示模型的深度学习框架。
方法:CDA-DGRL 包含几个关键步骤:首先,整合多种生物信息以计算 circRNA 和疾病之间的综合相似度,从而构建特定于 circRNA-疾病关联的异构网络。其次,使用稀疏自编码器提取 circRNA 和疾病节点特征。然后,通过输入 circRNA-疾病异构网络以及节点特征,使用图卷积神经网络捕获局部图网络结构。接着,利用 node2vec 对 circRNA-疾病异构网络进行深度优先采样,以捕获全局图网络结构,解决了原始稀疏数据的问题。最后,将局部和全局图网络结构融合到一个额外的树分类器中,以识别潜在的 circRNA-疾病关联。
结果:在 circR2Disease 数据集上进行的严格的五折交叉验证结果表明,CDA-DGRL 具有优越性,其 AUC 值为 0.9866,AUPR 值为 0.9897,优于现有的最先进模型。值得注意的是,该模型中使用的超随机树分类器优于其他机器学习分类器。
结论:因此,CDA-DGRL 是一种可靠识别 circRNA-疾病关联的有前途的方法,为减轻对广泛传统生物学实验的需求提供了潜在途径。本研究的源代码和数据可在 https://github.com/zywait/CDA-DGRL 上获取。
BMC Med Inform Decis Mak. 2024-6-6
IEEE/ACM Trans Comput Biol Bioinform. 2024
IEEE J Biomed Health Inform. 2024-7
Front Oncol. 2023-6-8
Cancers (Basel). 2022-12-22
IEEE/ACM Trans Comput Biol Bioinform. 2023
Brief Bioinform. 2022-5-13
Brief Bioinform. 2022-1-17
BMC Bioinformatics. 2021-11-12
Brief Bioinform. 2022-1-17