Suppr超能文献

LncSTPred:一种lncRNA亚细胞定位预测模型及影响定位的生物学决定因素的解析

LncSTPred: a predictive model of lncRNA subcellular localization and decipherment of the biological determinants influencing localization.

作者信息

Hu Si-Le, Chen Ying-Li, Zhang Lu-Qiang, Bai Hui, Yang Jia-Hong, Li Qian-Zhong

机构信息

School of Physical Science and Technology, Inner Mongolia University, Hohhot, China.

The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University, Hohhot, China.

出版信息

Front Mol Biosci. 2024 Sep 5;11:1452142. doi: 10.3389/fmolb.2024.1452142. eCollection 2024.

Abstract

INTRODUCTION

Long non-coding RNAs (lncRNAs) play crucial roles in genetic markers, genome rearrangement, chromatin modifications, and other biological processes. Increasing evidence suggests that lncRNA functions are closely related to their subcellular localization. However, the distribution of lncRNAs in different subcellular localizations is imbalanced. The number of lncRNAs located in the nucleus is more than ten times that in the exosome.

METHODS

In this study, we propose a new oversampling method to construct a predictive dataset and develop a predictive model called LncSTPred. This model improves the Adaboost algorithm for subcellular localization prediction using 3-mer, 3-RF sequence, and minimum free energy structure features.

RESULTS AND DISCUSSION

By using our improved Adaboost algorithm, better prediction accuracy for lncRNA subcellular localization was obtained. In addition, we evaluated feature importance by using the F-score and analyzed the influence of highly relevant features on lncRNAs. Our study shows that the ANA features may be a key factor for predicting lncRNA subcellular localization, which correlates with the composition of stems and loops in the secondary structure of lncRNAs.

摘要

引言

长链非编码RNA(lncRNAs)在遗传标记、基因组重排、染色质修饰及其他生物学过程中发挥着关键作用。越来越多的证据表明,lncRNA的功能与其亚细胞定位密切相关。然而,lncRNAs在不同亚细胞定位中的分布并不均衡。位于细胞核中的lncRNAs数量比外泌体中的多十多倍。

方法

在本研究中,我们提出了一种新的过采样方法来构建预测数据集,并开发了一种名为LncSTPred的预测模型。该模型利用3聚体、3-RF序列和最小自由能结构特征改进了用于亚细胞定位预测的Adaboost算法。

结果与讨论

通过使用我们改进的Adaboost算法,获得了更好的lncRNA亚细胞定位预测准确率。此外,我们使用F分数评估了特征重要性,并分析了高度相关特征对lncRNAs的影响。我们的研究表明,ANA特征可能是预测lncRNA亚细胞定位的关键因素,这与lncRNAs二级结构中的茎环组成相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe4f/11411566/f1828c262b6f/fmolb-11-1452142-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验