Suppr超能文献

基于特征融合和图自动编码器预测假基因- microRNA关联

Predicting Pseudogene-miRNA Associations Based on Feature Fusion and Graph Auto-Encoder.

作者信息

Zhou Shijia, Sun Weicheng, Zhang Ping, Li Li

机构信息

Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China.

Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China.

出版信息

Front Genet. 2021 Dec 13;12:781277. doi: 10.3389/fgene.2021.781277. eCollection 2021.

Abstract

Pseudogenes were originally regarded as non-functional components scattered in the genome during evolution. Recent studies have shown that pseudogenes can be transcribed into long non-coding RNA and play a key role at multiple functional levels in different physiological and pathological processes. microRNAs (miRNAs) are a type of non-coding RNA, which plays important regulatory roles in cells. Numerous studies have shown that pseudogenes and miRNAs have interactions and form a ceRNA network with mRNA to regulate biological processes and involve diseases. Exploring the associations of pseudogenes and miRNAs will facilitate the clinical diagnosis of some diseases. Here, we propose a prediction model PMGAE (Pseudogene-MiRNA association prediction based on the Graph Auto-Encoder), which incorporates feature fusion, graph auto-encoder (GAE), and eXtreme Gradient Boosting (XGBoost). First, we calculated three types of similarities including Jaccard similarity, cosine similarity, and Pearson similarity between nodes based on the biological characteristics of pseudogenes and miRNAs. Subsequently, we fused the above similarities to construct a similarity profile as the initial representation features for nodes. Then, we aggregated the similarity profiles and associations of nodes to obtain the low-dimensional representation vector of nodes through a GAE. In the last step, we fed these representation vectors into an XGBoost classifier to predict new pseudogene-miRNA associations (PMAs). The results of five-fold cross validation show that PMGAE achieves a mean AUC of 0.8634 and mean AUPR of 0.8966. Case studies further substantiated the reliability of PMGAE for mining PMAs and the study of endogenous RNA networks in relation to diseases.

摘要

假基因最初被认为是在进化过程中散布在基因组中的无功能成分。最近的研究表明,假基因可以转录成长链非编码RNA,并在不同生理和病理过程的多个功能水平上发挥关键作用。微小RNA(miRNA)是一种非编码RNA,在细胞中发挥重要的调节作用。大量研究表明,假基因和miRNA之间存在相互作用,并与mRNA形成ceRNA网络来调节生物过程并涉及疾病。探索假基因和miRNA之间的关联将有助于某些疾病的临床诊断。在此,我们提出了一种预测模型PMGAE(基于图自动编码器的假基因- miRNA关联预测),该模型结合了特征融合、图自动编码器(GAE)和极端梯度提升(XGBoost)。首先,我们根据假基因和miRNA的生物学特征计算了节点之间的三种相似性,包括杰卡德相似性、余弦相似性和皮尔逊相似性。随后,我们融合上述相似性以构建相似性概况作为节点的初始表示特征。然后,我们聚合节点的相似性概况和关联,通过GAE获得节点的低维表示向量。在最后一步中,我们将这些表示向量输入到XGBoost分类器中以预测新的假基因- miRNA关联(PMA)。五折交叉验证的结果表明,PMGAE的平均AUC为0.8634,平均AUPR为0.8966。案例研究进一步证实了PMGAE在挖掘PMA以及研究与疾病相关的内源性RNA网络方面的可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/01fc/8710693/9f5d76c9764d/fgene-12-781277-g003.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验