• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用序列衍生特征和机器学习预测细菌小RNA

Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning.

作者信息

Jha Tony, Mendel Jovinna, Cho Hyuk, Choudhary Madhusudan

机构信息

Department of Mathematics, University of California, Berkeley, Berkeley, CA, USA.

Department of Biological Sciences, Sam Houston State University, Huntsville, TX, USA.

出版信息

Bioinform Biol Insights. 2022 Aug 18;16:11779322221118335. doi: 10.1177/11779322221118335. eCollection 2022.

DOI:10.1177/11779322221118335
PMID:36016866
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9397377/
Abstract

Small ribonucleic acid (sRNA) sequences are 50-500 nucleotide long, noncoding RNA (ncRNA) sequences that play an important role in regulating transcription and translation within a bacterial cell. As such, identifying sRNA sequences within an organism's genome is essential to understand the impact of the RNA molecules on cellular processes. Recently, numerous machine learning models have been applied to predict sRNAs within bacterial genomes. In this study, we considered the sRNA prediction as an imbalanced binary classification problem to distinguish minor positive sRNAs from major negative ones within imbalanced data and then performed a comparative study with six learning algorithms and seven assessment metrics. First, we collected numerical feature groups extracted from known sRNAs previously identified in LT2 (SLT2) and K12 ( K12) genomes. Second, as a preliminary study, we characterized the sRNA-size distribution with the conformity test for Benford's law. Third, we applied six traditional classification algorithms to sRNA features and assessed classification performance with seven metrics, varying positive-to-negative instance ratios, and utilizing stratified 10-fold cross-validation. We revisited important individual features and feature groups and found that classification with combined features perform better than with either an individual feature or a single feature group in terms of Area Under Precision-Recall curve (AUPR). We reconfirmed that AUPR properly measures classification performance on imbalanced data with varying imbalance ratios, which is consistent with previous studies on classification metrics for imbalanced data. Overall, eXtreme Gradient Boosting (XGBoost), even without exploiting optimal hyperparameter values, performed better than the other five algorithms with specific optimal parameter settings. As a future work, we plan to extend XGBoost further to a large amount of published sRNAs in bacterial genomes and compare its classification performance with recent machine learning models' performance.

摘要

小核糖核酸(sRNA)序列是长度为50 - 500个核苷酸的非编码RNA(ncRNA)序列,在细菌细胞内的转录和翻译调控中发挥着重要作用。因此,识别生物体基因组中的sRNA序列对于理解RNA分子对细胞过程的影响至关重要。最近,许多机器学习模型已被应用于预测细菌基因组中的sRNA。在本研究中,我们将sRNA预测视为一个不平衡的二元分类问题,以在不平衡数据中区分少量的阳性sRNA和大量的阴性sRNA,然后使用六种学习算法和七种评估指标进行了比较研究。首先,我们收集了从先前在LT2(SLT2)和K12(K12)基因组中鉴定出的已知sRNA中提取的数值特征组。其次,作为一项初步研究,我们用本福特定律的一致性检验对sRNA大小分布进行了表征。第三,我们将六种传统分类算法应用于sRNA特征,并使用七种指标、不同的正负实例比率以及分层10折交叉验证来评估分类性能。我们重新审视了重要的个体特征和特征组,发现就精确率-召回率曲线下面积(AUPR)而言,使用组合特征进行分类比使用单个特征或单个特征组的效果更好。我们再次证实,AUPR能够正确衡量不同不平衡比率的不平衡数据上的分类性能,这与先前关于不平衡数据分类指标的研究一致。总体而言,极端梯度提升(XGBoost)即使没有使用最优超参数值,在特定最优参数设置下也比其他五种算法表现更好。作为未来的工作,我们计划将XGBoost进一步扩展到细菌基因组中大量已发表的sRNA,并将其分类性能与最近的机器学习模型的性能进行比较。

相似文献

1
Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning.利用序列衍生特征和机器学习预测细菌小RNA
Bioinform Biol Insights. 2022 Aug 18;16:11779322221118335. doi: 10.1177/11779322221118335. eCollection 2022.
2
Prioritizing bona fide bacterial small RNAs with machine learning classifiers.使用机器学习分类器对真正的细菌小RNA进行优先级排序。
PeerJ. 2019 Jan 24;7:e6304. doi: 10.7717/peerj.6304. eCollection 2019.
3
Sequence-based bacterial small RNAs prediction using ensemble learning strategies.基于序列的细菌小 RNA 预测使用集成学习策略。
BMC Bioinformatics. 2018 Dec 21;19(Suppl 20):503. doi: 10.1186/s12859-018-2535-1.
4
Exploring the Common Mechanism of Fungal sRNA Transboundary Regulation of Plants Based on Ensemble Learning Methods.基于集成学习方法探索真菌小RNA对植物跨界调控的共同机制
Front Genet. 2022 Feb 11;13:816478. doi: 10.3389/fgene.2022.816478. eCollection 2022.
5
sRNAscanner: a computational tool for intergenic small RNA detection in bacterial genomes.sRNAscanner:一种用于在细菌基因组中检测基因间小 RNA 的计算工具。
PLoS One. 2010 Aug 5;5(8):e11970. doi: 10.1371/journal.pone.0011970.
6
sRNARFTarget: a fast machine-learning-based approach for transcriptome-wide sRNA target prediction.sRNA 靶标预测的快速机器学习方法:基于转录组范围的 sRNA 靶标预测。
RNA Biol. 2022;19(1):44-54. doi: 10.1080/15476286.2021.2012058. Epub 2021 Dec 31.
7
Genome-wide identification and characterization of small RNAs in Rhodobacter capsulatus and identification of small RNAs affected by loss of the response regulator CtrA.荚膜红细菌中小RNA的全基因组鉴定与特征分析以及受应答调节因子CtrA缺失影响的小RNA的鉴定
RNA Biol. 2017 Jul 3;14(7):914-925. doi: 10.1080/15476286.2017.1306175. Epub 2017 Mar 15.
8
An improved method for identification of small non-coding RNAs in bacteria using support vector machine.利用支持向量机改进细菌中小非编码 RNA 的鉴定方法。
Sci Rep. 2017 Apr 6;7:46070. doi: 10.1038/srep46070.
9
sTarPicker: a method for efficient prediction of bacterial sRNA targets based on a two-step model for hybridization.StarPicker:一种基于杂交两步模型的高效细菌 sRNA 靶标预测方法。
PLoS One. 2011;6(7):e22705. doi: 10.1371/journal.pone.0022705. Epub 2011 Jul 22.
10
Global Annotation, Expression Analysis, and Stability of Candidate sRNAs in Group B Streptococcus.B 群链球菌候选 sRNAs 的全局注释、表达分析和稳定性。
mBio. 2021 Dec 21;12(6):e0280321. doi: 10.1128/mBio.02803-21. Epub 2021 Nov 2.

本文引用的文献

1
The Novel ncRNA OsiR Positively Regulates Expression of and Is Required for Oxidative Stress Tolerance in .新型 ncRNA OsiR 正向调控 的表达,是 氧化应激耐受所必需的。
Int J Mol Sci. 2020 Apr 30;21(9):3200. doi: 10.3390/ijms21093200.
2
Translation inhibition from a distance: The small RNA SgrS silences a ribosomal protein S1-dependent enhancer.远程翻译抑制:小 RNA SgrS 沉默核糖体蛋白 S1 依赖性增强子。
Mol Microbiol. 2020 Sep;114(3):391-408. doi: 10.1111/mmi.14514. Epub 2020 May 2.
3
sRNA-mediated control in bacteria: An increasing diversity of regulatory mechanisms.
细菌中的 sRNA 介导调控:不断增加的调控机制多样性。
Biochim Biophys Acta Gene Regul Mech. 2020 May;1863(5):194504. doi: 10.1016/j.bbagrm.2020.194504. Epub 2020 Mar 9.
4
The power of cooperation: Experimental and computational approaches in the functional characterization of bacterial sRNAs.合作的力量:细菌 sRNA 功能特征的实验和计算方法。
Mol Microbiol. 2020 Mar;113(3):603-612. doi: 10.1111/mmi.14420. Epub 2019 Nov 28.
5
RNA-Dependent Regulation of Virulence in Pathogenic Bacteria.RNA 依赖性调节致病菌的毒力。
Front Cell Infect Microbiol. 2019 Oct 9;9:337. doi: 10.3389/fcimb.2019.00337. eCollection 2019.
6
RsaC sRNA modulates the oxidative stress response of Staphylococcus aureus during manganese starvation.RsaC sRNA 调控金黄色葡萄球菌在锰饥饿时的氧化应激反应。
Nucleic Acids Res. 2019 Oct 10;47(18):9871-9887. doi: 10.1093/nar/gkz728.
7
Regulation of Transcription Termination of Small RNAs and by Small RNAs: Molecular Mechanisms and Biological Functions.小 RNA 转录终止的调控和小 RNA 的调控:分子机制和生物学功能。
Front Cell Infect Microbiol. 2019 Jun 12;9:201. doi: 10.3389/fcimb.2019.00201. eCollection 2019.
8
Prioritizing bona fide bacterial small RNAs with machine learning classifiers.使用机器学习分类器对真正的细菌小RNA进行优先级排序。
PeerJ. 2019 Jan 24;7:e6304. doi: 10.7717/peerj.6304. eCollection 2019.
9
Sequence-based bacterial small RNAs prediction using ensemble learning strategies.基于序列的细菌小 RNA 预测使用集成学习策略。
BMC Bioinformatics. 2018 Dec 21;19(Suppl 20):503. doi: 10.1186/s12859-018-2535-1.
10
Small Regulatory RNAs in the Enterobacterial Response to Envelope Damage and Oxidative Stress.肠杆菌对包膜损伤和氧化应激的反应中的小调控 RNA。
Microbiol Spectr. 2018 Jul;6(4). doi: 10.1128/microbiolspec.RWR-0022-2018.