Suppr超能文献

微生物基因组中蛋白质编码区域的自我识别。

Self-identification of protein-coding regions in microbial genomes.

作者信息

Audic S, Claverie J M

机构信息

Structural and Genetic Information Laboratory, Centre National de la Recherche Scientifique-EP.91, 31 rue Joseph Aiguier, Marseille F-13402, France.

出版信息

Proc Natl Acad Sci U S A. 1998 Aug 18;95(17):10026-31. doi: 10.1073/pnas.95.17.10026.

Abstract

A new method for predicting protein-coding regions in microbial genomic DNA sequences is presented. It uses an ab initio iterative Markov modeling procedure to automatically perform the partition of genomic sequences into three subsets shown to correspond to coding, coding on the opposite strand, and noncoding segments. In contrast to current methods, such as GENEMARK [Borodovsky, M. & McIninch, J. D. (1993) Comput. Chem. 17, 123-133], no training set or prior knowledge of the statistical properties of the studied genome are required. This new method tolerates error rates of 1-2% and can process unassembled sequences. It is thus ideal for the analysis of genome survey and/or fragmented sequence data from uncharacterized microorganisms. The method was validated on 10 complete bacterial genomes (from four major phylogenetic lineages). The results show that protein-coding regions can be identified with an accuracy of up to 90% with a totally automated and objective procedure.

摘要

本文提出了一种预测微生物基因组DNA序列中蛋白质编码区的新方法。它使用从头开始的迭代马尔可夫建模程序,自动将基因组序列划分为三个子集,分别对应于编码区、反向链编码区和非编码区。与当前方法(如GENEMARK [博罗多夫斯基,M. & 麦金奇,J. D. (1993) 计算机化学17, 123 - 133])不同,该方法不需要训练集或对所研究基因组统计特性的先验知识。这种新方法能够容忍1 - 2%的错误率,并且可以处理未组装的序列。因此,它非常适合分析来自未表征微生物的基因组调查和/或片段化序列数据。该方法在10个完整的细菌基因组(来自四个主要系统发育谱系)上进行了验证。结果表明,通过完全自动化和客观的程序,可以以高达90%的准确率识别蛋白质编码区。

相似文献

1
Self-identification of protein-coding regions in microbial genomes.微生物基因组中蛋白质编码区域的自我识别。
Proc Natl Acad Sci U S A. 1998 Aug 18;95(17):10026-31. doi: 10.1073/pnas.95.17.10026.
3
Prokaryotic gene prediction using GeneMark and GeneMark.hmm.使用GeneMark和GeneMark.hmm进行原核生物基因预测。
Curr Protoc Bioinformatics. 2003 May;Chapter 4:Unit4.5. doi: 10.1002/0471250953.bi0405s01.
8
The regional rule for bacterial base composition.细菌碱基组成的区域规则。
Trends Genet. 2005 Aug;21(8):440-3. doi: 10.1016/j.tig.2005.06.002.

引用本文的文献

3
Classifying coding DNA with nucleotide statistics.利用核苷酸统计对编码DNA进行分类。
Bioinform Biol Insights. 2009 Oct 28;3:141-54. doi: 10.4137/bbi.s3030.

本文引用的文献

6
GenBank.基因银行
Nucleic Acids Res. 1998 Jan 1;26(1):1-7. doi: 10.1093/nar/26.1.1.
9
A genomic perspective on protein families.蛋白质家族的基因组视角。
Science. 1997 Oct 24;278(5338):631-7. doi: 10.1126/science.278.5338.631.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验