Suppr超能文献

整合完全测序细菌基因组中基因识别的多种证据。

Combining diverse evidence for gene recognition in completely sequenced bacterial genomes.

作者信息

Frishman D, Mironov A, Mewes H W, Gelfand M

机构信息

Munich Information Center for Protein Sequences (MIPS) of the German National Center for Health and Environment (GSF), Am Klopferspitz 18a, 82152 Martinsried, Germany.

出版信息

Nucleic Acids Res. 1998 Jun 15;26(12):2941-7. doi: 10.1093/nar/26.12.2941.

Abstract

Analysis of a newly sequenced bacterial genome starts with identification of protein-coding genes. Functional assignment of proteins requires the exact knowledge of protein N-termini. We present a new program ORPHEUS that identifies candidate genes and accurately predicts gene starts. The analysis starts with a database similarity search and identification of reliable gene fragments. The latter are used to derive statistical characteristics of protein-coding regions and ribosome-binding sites and to predict the complete set of genes in the analyzed genome. In a test on Bacillus subtilis and Escherichia coli genomes, the program correctly identified 93.3% (resp. 96.3%) of experimentally annotated genes longer than 100 codons described in the PIR-International database, and for these genes 96.3% (83.9%) of starts were predicted exactly. Furthermore, 98.9% (99.1%) of genes longer than 100 codons annotated in GenBank were found, and 92.9% (75.7%) of predicted starts coincided with the feature table description. Finally, for the complete gene complements of B.subtilis and E.coli , including genes shorter than 100 codons, gene prediction accuracy was 88.9 and 87.1%, respectively, with 94.2 and 76.7% starts coinciding with the existing annotation.

摘要

对新测序的细菌基因组进行分析始于蛋白质编码基因的识别。蛋白质的功能分配需要准确了解蛋白质的N端。我们提出了一个新程序ORPHEUS,它可以识别候选基因并准确预测基因起始位点。分析从数据库相似性搜索和可靠基因片段的识别开始。后者用于推导蛋白质编码区和核糖体结合位点的统计特征,并预测分析基因组中的完整基因集。在对枯草芽孢杆菌和大肠杆菌基因组的测试中,该程序正确识别了PIR国际数据库中描述的93.3%(分别为96.3%)长度超过100个密码子的实验注释基因,对于这些基因,96.3%(83.9%)的起始位点被准确预测。此外,发现了GenBank中注释的98.9%(99.1%)长度超过100个密码子的基因,92.9%(75.7%)的预测起始位点与特征表描述一致。最后,对于枯草芽孢杆菌和大肠杆菌的完整基因互补体,包括长度小于100个密码子的基因,基因预测准确率分别为88.9%和87.1%,94.2%和76.7%的起始位点与现有注释一致。

相似文献

5
Reannotation of Shewanella oneidensis genome.嗜铁素还原地杆菌基因组的重新注释
OMICS. 2003 Summer;7(2):171-5. doi: 10.1089/153623103322246566.
8
Bacterial start site prediction.细菌起始位点预测。
Nucleic Acids Res. 1999 Sep 1;27(17):3577-82. doi: 10.1093/nar/27.17.3577.

引用本文的文献

1
Analysis of metagenomic data.宏基因组数据的分析
Nat Rev Methods Primers. 2025;5. doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23.
3
A Primer on Infectious Disease Bacterial Genomics.传染病细菌基因组学入门
Clin Microbiol Rev. 2016 Oct;29(4):881-913. doi: 10.1128/CMR.00001-16. Epub 2016 Sep 7.
7
Gene prediction in metagenomic fragments based on the SVM algorithm.基于 SVM 算法的宏基因组片段基因预测。
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S12. doi: 10.1186/1471-2105-14-S5-S12. Epub 2013 Apr 10.
8
How many signal peptides are there in bacteria?细菌中有多少个信号肽?
Environ Microbiol. 2013 Apr;15(4):983-90. doi: 10.1111/1462-2920.12105.

本文引用的文献

5
Comparison of DNA sequences with protein sequences.DNA序列与蛋白质序列的比较。
Genomics. 1997 Nov 15;46(1):24-36. doi: 10.1006/geno.1997.4995.
7
The PIR-International Protein Sequence Database.国际蛋白质信息资源数据库。
Nucleic Acids Res. 1998 Jan 1;26(1):27-32. doi: 10.1093/nar/26.1.27.
9
The complete genome sequence of Escherichia coli K-12.大肠杆菌K-12的全基因组序列。
Science. 1997 Sep 5;277(5331):1453-62. doi: 10.1126/science.277.5331.1453.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验