Suppr超能文献

使用机器学习分类器对真正的细菌小RNA进行优先级排序。

Prioritizing bona fide bacterial small RNAs with machine learning classifiers.

作者信息

Eppenhof Erik J J, Peña-Castillo Lourdes

机构信息

Department of Artificial Intelligence, Radboud University Nijmegen, Nijmegen, Netherlands.

Department of Biology, Memorial University of Newfoundland, St. John's, Canada.

出版信息

PeerJ. 2019 Jan 24;7:e6304. doi: 10.7717/peerj.6304. eCollection 2019.

Abstract

Bacterial small (sRNAs) are involved in the control of several cellular processes. Hundreds of putative sRNAs have been identified in many bacterial species through RNA sequencing. The existence of putative sRNAs is usually validated by Northern blot analysis. However, the large amount of novel putative sRNAs reported in the literature makes it impractical to validate each of them in the wet lab. In this work, we applied five machine learning approaches to construct twenty models to discriminate bona fide sRNAs from random genomic sequences in five bacterial species. Sequences were represented using seven features including free energy of their predicted secondary structure, their distances to the closest predicted promoter site and Rho-independent terminator, and their distance to the closest open reading frames (ORFs). To automatically calculate these features, we developed an sRNA Characterization Pipeline (sRNACharP). All seven features used in the classification task contributed positively to the performance of the predictive models. The best performing model obtained a median precision of 100% at 10% recall and of 64% at 40% recall across all five bacterial species, and it outperformed previous published approaches on two benchmark datasets in terms of precision and recall. Our results indicate that even though there is limited sRNA sequence conservation across different bacterial species, there are intrinsic features in the genomic context of sRNAs that are conserved across taxa. We show that these features are utilized by machine learning approaches to learn a species-independent model to prioritize bona fide bacterial sRNAs.

摘要

细菌小RNA(sRNAs)参与多种细胞过程的调控。通过RNA测序,在许多细菌物种中已鉴定出数百种假定的sRNAs。假定sRNAs的存在通常通过Northern印迹分析来验证。然而,文献中报道的大量新型假定sRNAs使得在湿实验室中对它们逐一进行验证变得不切实际。在这项工作中,我们应用了五种机器学习方法构建了二十个模型,以区分五个细菌物种中真正的sRNAs与随机基因组序列。使用七个特征来表示序列,包括其预测二级结构的自由能、与最接近的预测启动子位点和不依赖Rho的终止子的距离,以及与最接近的开放阅读框(ORFs)的距离。为了自动计算这些特征,我们开发了一个sRNA特征分析管道(sRNACharP)。分类任务中使用的所有七个特征对预测模型的性能都有积极贡献。表现最佳的模型在所有五个细菌物种中,召回率为10%时中位数精度为100%,召回率为40%时中位数精度为64%,并且在精度和召回率方面优于之前在两个基准数据集上发表的方法。我们的结果表明,尽管不同细菌物种之间sRNA序列保守性有限,但sRNAs的基因组背景中存在跨分类群保守的内在特征。我们表明,机器学习方法利用这些特征来学习一个不依赖物种的模型,以对真正的细菌sRNAs进行优先级排序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ddc/6348098/b7bc2e3d8e62/peerj-07-6304-g001.jpg

相似文献

4
Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning.利用序列衍生特征和机器学习预测细菌小RNA
Bioinform Biol Insights. 2022 Aug 18;16:11779322221118335. doi: 10.1177/11779322221118335. eCollection 2022.
10
Bacterial small RNAs in the Genus Rickettsia.立克次氏体属中的细菌小RNA
BMC Genomics. 2015 Dec 18;16:1075. doi: 10.1186/s12864-015-2293-7.

引用本文的文献

4
Prediction of Bacterial sRNAs Using Sequence-Derived Features and Machine Learning.利用序列衍生特征和机器学习预测细菌小RNA
Bioinform Biol Insights. 2022 Aug 18;16:11779322221118335. doi: 10.1177/11779322221118335. eCollection 2022.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验