Suppr超能文献

爆米花:原核生物中短编码和非编码基因组序列的预测

Popcorn: prediction of short coding and noncoding genomic sequences in prokaryotes.

作者信息

Kyrouz Alison, Liu Lian, Qin Lixin, Tjaden Brian

机构信息

Department of Computer Science, Wellesley College, Wellesley, MA 02481, United States.

出版信息

Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf250.

Abstract

SUMMARY

The most challenging prokaryotic genes to identify often correspond to short ORFs (sORFs) encoding small proteins or to noncoding RNAs. RNA-seq experiments commonly evince small transcripts that do not correspond to annotated genes and are candidates for novel coding sORFs or small regulatory RNAs, but it can be difficult to accurately assess whether the numerous small transcripts are coding or not. We present Popcorn (PrOkaryotic Prediction of Coding OR Noncoding), a novel machine learning method for determining whether prokaryotic sequences are coding or noncoding. We find that Popcorn is effective in distinguishing coding from noncoding sequences, including coding sORFs and noncoding RNAs.

AVAILABILITY AND IMPLEMENTATION

Freely available for use on the web at https://cs.wellesley.edu/∼btjaden/Popcorn. Source code available at https://github.com/btjaden/Popcorn and https://doi.org/10.5281/zenodo.15120075.

摘要

摘要

最难鉴定的原核生物基因通常对应于编码小蛋白的短开放阅读框(sORF)或非编码RNA。RNA测序实验通常会显示出与注释基因不对应的小转录本,这些小转录本是新型编码sORF或小调控RNA的候选者,但很难准确评估众多小转录本是否具有编码功能。我们提出了Popcorn(原核生物编码或非编码预测),这是一种用于确定原核生物序列是编码还是非编码的新型机器学习方法。我们发现Popcorn在区分编码序列和非编码序列方面很有效,包括编码sORF和非编码RNA。

可用性和实现方式

可在https://cs.wellesley.edu/∼btjaden/Popcorn上免费在线使用。源代码可在https://github.com/btjaden/Popcorn和https://doi.org/10.5281/zenodo.15120075上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e66a/12054974/5f10ab44fd10/btaf250f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验