Suppr超能文献

用于预测长链非编码RNA-疾病关联的几何互补异构信息与随机森林

Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations.

作者信息

Yao Dengju, Zhang Tao, Zhan Xiaojuan, Zhang Shuli, Zhan Xiaorong, Zhang Chao

机构信息

School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China.

College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China.

出版信息

Front Genet. 2022 Aug 24;13:995532. doi: 10.3389/fgene.2022.995532. eCollection 2022.

Abstract

More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective treatments for diseases. Plenty of lncRNA-disease association prediction models have been raised but it is still a challenge to recognize unknown lncRNA-disease associations. In this work, we have proposed a computational model for predicting lncRNA-disease associations based on geometric complement heterogeneous information and random forest. Firstly, geometric complement heterogeneous information was used to integrate lncRNA-miRNA interactions and miRNA-disease associations verified by experiments. Secondly, lncRNA and disease features consisted of their respective similarity coefficients were fused into input feature space. Thirdly, an autoencoder was adopted to project raw high-dimensional features into low-dimension space to learn representation for lncRNAs and diseases. Finally, the low-dimensional lncRNA and disease features were fused into input feature space to train a random forest classifier for lncRNA-disease association prediction. Under five-fold cross-validation, the AUC (area under the receiver operating characteristic curve) is 0.9897 and the AUPR (area under the precision-recall curve) is 0.7040, indicating that the performance of our model is better than several state-of-the-art lncRNA-disease association prediction models. In addition, case studies on colon and stomach cancer indicate that our model has a good ability to predict disease-related lncRNAs.

摘要

越来越多的证据表明,长链非编码RNA(lncRNA)的异常表达与多种人类疾病相关。因此,准确识别与疾病相关的lncRNA有助于从分子水平了解lncRNA的表达情况,并探索更有效的疾病治疗方法。虽然已经提出了大量lncRNA-疾病关联预测模型,但识别未知的lncRNA-疾病关联仍然是一项挑战。在这项工作中,我们提出了一种基于几何互补异构信息和随机森林的lncRNA-疾病关联预测计算模型。首先,利用几何互补异构信息整合lncRNA- miRNA相互作用和经实验验证的miRNA-疾病关联。其次,将由lncRNA和疾病各自的相似系数组成的特征融合到输入特征空间中。第三,采用自动编码器将原始高维特征投影到低维空间,以学习lncRNA和疾病的表示。最后,将低维的lncRNA和疾病特征融合到输入特征空间中,训练一个随机森林分类器用于lncRNA-疾病关联预测。在五折交叉验证下,受试者工作特征曲线下面积(AUC)为0.9897,精确率-召回率曲线下面积(AUPR)为0.7040,表明我们模型的性能优于几种当前最先进的lncRNA-疾病关联预测模型。此外,对结肠癌和胃癌的案例研究表明,我们的模型具有良好的预测疾病相关lncRNA的能力。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验