Suppr超能文献

基于机器学习的疾病相关长非编码 RNA 的计算预测。

Computational prediction of disease related lncRNAs using machine learning.

机构信息

Computational Biology Research Lab, Department of Computer Science, National University of Computer and Emerging Sciences, NUCES-FAST, Islamabad, Pakistan.

National Center for Bioinformatics (NCB), Quaid-i-Azam University, Islamabad, Pakistan.

出版信息

Sci Rep. 2023 Jan 16;13(1):806. doi: 10.1038/s41598-023-27680-7.

Abstract

Long non-coding RNAs (lncRNAs), which were once considered as transcriptional noise, are now in the limelight of current research. LncRNAs play a major role in regulating various biological processes such as imprinting, cell differentiation, and splicing. The mutations of lncRNAs are involved in various complex diseases. Identifying lncRNA-disease associations has gained a lot of attention as predicting it efficiently will lead towards better disease treatment. In this study, we have developed a machine learning model that predicts disease-related lncRNAs by combining sequence and structure-based features. The features were trained on SVM and Random Forest classifiers. We have compared our method with the state-of-the-art and obtained the highest F1 score of 76% on SVM classifier. Moreover, this study has overcome two serious limitations of the reported method which are lack of redundancy checking and implementation of oversampling for balancing the positive and negative class. Our method has achieved improved performance among machine learning models reported for lncRNA-disease associations. Combining multiple features together specifically lncRNAs sequence mutation has a significant contribution to the disease related lncRNA prediction.

摘要

长非编码 RNA(lncRNA)曾经被认为是转录噪声,现在成为当前研究的焦点。lncRNA 在调节印迹、细胞分化和剪接等各种生物过程中起着重要作用。lncRNA 的突变与各种复杂疾病有关。有效地识别 lncRNA-疾病关联引起了广泛关注,因为这将有助于更好地治疗疾病。在这项研究中,我们开发了一种机器学习模型,通过结合基于序列和结构的特征来预测与疾病相关的 lncRNA。特征是在 SVM 和随机森林分类器上进行训练的。我们将我们的方法与最先进的方法进行了比较,并在 SVM 分类器上获得了最高的 F1 分数 76%。此外,这项研究克服了报道方法的两个严重限制,即缺乏冗余检查和实施过采样以平衡正负类。我们的方法在报道的 lncRNA-疾病关联的机器学习模型中实现了改进的性能。将多个特征(特别是 lncRNA 序列突变)结合在一起对疾病相关 lncRNA 的预测有显著贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fdba/9842610/2d7a00567332/41598_2023_27680_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验