Suppr超能文献

使用多重图对比学习预测非编码RNA与疾病的关联

Predicting noncoding RNA and disease associations using multigraph contrastive learning.

作者信息

Sun Si-Lin, Jiang Yue-Yi, Yang Jun-Ping, Xiu Yu-Han, Bilal Anas, Long Hai-Xia

机构信息

College of Information Science Technology, Hainan Normal University, Haikou, 571158, China.

Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, 571158, China.

出版信息

Sci Rep. 2025 Jan 2;15(1):230. doi: 10.1038/s41598-024-81862-5.

Abstract

MiRNAs and lncRNAs are two essential noncoding RNAs. Predicting associations between noncoding RNAs and diseases can significantly improve the accuracy of early diagnosis.With the continuous breakthroughs in artificial intelligence, researchers increasingly use deep learning methods to predict associations. Nevertheless, most existing methods face two major issues: low prediction accuracy and the limitation of only being able to predict a single type of noncoding RNA-disease association. To address these challenges, this paper proposes a method called K-Means and multigraph Contrastive Learning for predicting associations among miRNAs, lncRNAs, and diseases (K-MGCMLD). The K-MGCMLD model is divided into four main steps. The first step is the construction of a heterogeneous graph. The second step involves down sampling using the K-means clustering algorithm to balance the positive and negative samples. The third step is to use an encoder with a Graph Convolutional Network (GCN) architecture to extract embedding vectors. Multigraph contrastive learning, including both local and global graph contrastive learning, is used to help the embedding vectors better capture the latent topological features of the graph. The fourth step involves feature reconstruction using the balanced positive and negative samples and the embedding vectors fed into an XGBoost classifier for multi-association classification prediction. Experimental results have shown that AUC value for miRNA-disease association is 0.9542, lncRNA-disease association is 0.9603, and lncRNA-miRNA association is 0.9687. Additionally, this study has conducted case analyses using K-MGCMLD, which has validated the associations of all the top 30 miRNAs predicted to be associated with lung cancer and Alzheimer's diseases.

摘要

微小RNA(miRNAs)和长链非编码RNA(lncRNAs)是两种重要的非编码RNA。预测非编码RNA与疾病之间的关联能够显著提高早期诊断的准确性。随着人工智能的不断突破,研究人员越来越多地使用深度学习方法来预测这种关联。然而,大多数现有方法面临两个主要问题:预测准确率低以及只能预测单一类型的非编码RNA-疾病关联的局限性。为应对这些挑战,本文提出了一种名为K均值和多图对比学习的方法,用于预测miRNA、lncRNA和疾病之间的关联(K-MGCMLD)。K-MGCMLD模型分为四个主要步骤。第一步是构建异构图。第二步使用K均值聚类算法进行下采样,以平衡正样本和负样本。第三步是使用具有图卷积网络(GCN)架构的编码器来提取嵌入向量。多图对比学习,包括局部和全局图对比学习,用于帮助嵌入向量更好地捕捉图的潜在拓扑特征。第四步使用平衡的正样本和负样本以及嵌入向量进行特征重构,并将其输入XGBoost分类器进行多关联分类预测。实验结果表明,miRNA-疾病关联的AUC值为0.9542,lncRNA-疾病关联的AUC值为0.9603,lncRNA-miRNA关联的AUC值为0.9687。此外,本研究使用K-MGCMLD进行了案例分析,验证了预测与肺癌和阿尔茨海默病相关的所有前30个miRNA的关联。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa19/11695719/0ea68ae07130/41598_2024_81862_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验