一种用于原核生物翻译起始位点计算识别的比较基因组方法。

A comparative genomic method for computational identification of prokaryotic translation initiation sites.

作者信息

Walker Megon, Pavlovic Vladimir, Kasif Simon

机构信息

Bioinformatics Program, Boston University, Boston, MA 02215, USA.

出版信息

Nucleic Acids Res. 2002 Jul 15;30(14):3181-91. doi: 10.1093/nar/gkf423.

DOI:10.1093/nar/gkf423

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC135744/

Abstract

The ever growing number of completely sequenced prokaryotic genomes facilitates cross-species comparisons by genomic annotation algorithms. This paper introduces a new probabilistic framework for comparative genomic analysis and demonstrates its utility in the context of improving the accuracy of prokaryotic gene start site detection. Our frame work employs a product hidden Markov model (PROD-HMM) with state architecture to model the species-specific trinucleotide frequency patterns in sequences immediately upstream and downstream of a translation start site and to detect the contrasting non-synonymous (amino acid changing) and synonymous (silent) substitution rates that differentiate prokaryotic coding from intergenic regions. Depending on the intricacy of the features modeled by the hidden state architecture, intergenic, regulatory, promoter and coding regions can be delimited by this method. The new system is evaluated using a preliminary set of orthologous Pyrococcus gene pairs, for which it demonstrates an improved accuracy of detection. Its robustness is confirmed by analysis with cross-validation of an experimentally verified set of Escherichia coli K-12 and Salmonella thyphimurium LT2 orthologs. The novel architecture has a number of attractive features that distinguish it from previous comparative models such as pair-HMMs.

摘要

完全测序的原核生物基因组数量不断增加，这有助于通过基因组注释算法进行跨物种比较。本文介绍了一种用于比较基因组分析的新概率框架，并展示了其在提高原核生物基因起始位点检测准确性方面的实用性。我们的框架采用具有状态结构的乘积隐马尔可夫模型（PROD-HMM），对翻译起始位点上下游序列中物种特异性的三核苷酸频率模式进行建模，并检测区分原核生物编码区和基因间区域的不同非同义（氨基酸变化）和同义（沉默）替换率。根据隐状态结构所建模特征的复杂性，该方法可以界定基因间区域、调控区域、启动子区域和编码区域。使用一组初步的直系同源嗜热栖热菌基因对评估了新系统，结果表明其检测准确性有所提高。通过对一组经实验验证的大肠杆菌K-12和鼠伤寒沙门氏菌LT2直系同源物进行交叉验证分析，证实了其稳健性。这种新颖的结构具有许多吸引人的特征，使其有别于以前的比较模型，如配对隐马尔可夫模型。

相似文献

1

A comparative genomic method for computational identification of prokaryotic translation initiation sites.一种用于原核生物翻译起始位点计算识别的比较基因组方法。

Nucleic Acids Res. 2002 Jul 15;30(14):3181-91. doi: 10.1093/nar/gkf423.

2

GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.GeneMarkS：一种用于预测微生物基因组中基因起始位点的自训练方法。对在调控区域中寻找序列基序的启示。

Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607.

3

Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.通过“逐帧”算法寻找原核生物基因：靶向基因起始位点和重叠基因。

Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874.

4

Accuracy improvement for identifying translation initiation sites in microbial genomes.提高微生物基因组中翻译起始位点识别的准确性。

Bioinformatics. 2004 Dec 12;20(18):3308-17. doi: 10.1093/bioinformatics/bth390. Epub 2004 Jul 9.

5

An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.一种用于原核生物基因组中顺式调控基序识别的综合且适用的系统发育足迹分析框架。

BMC Genomics. 2016 Aug 9;17:578. doi: 10.1186/s12864-016-2982-x.

6

TICO: a tool for improving predictions of prokaryotic translation initiation sites.TICO：一种用于改进原核生物翻译起始位点预测的工具。

Bioinformatics. 2005 Sep 1;21(17):3568-9. doi: 10.1093/bioinformatics/bti563. Epub 2005 Jun 30.

7

Translation of the flagellar gene fliO of Salmonella typhimurium from putative tandem starts.鼠伤寒沙门氏菌鞭毛基因fliO从假定的串联起始位点的翻译。

J Bacteriol. 1998 Jun;180(11):2936-42. doi: 10.1128/JB.180.11.2936-2942.1998.

8

Bacterial start site prediction.细菌起始位点预测。

Nucleic Acids Res. 1999 Sep 1;27(17):3577-82. doi: 10.1093/nar/27.17.3577.

9

A compression-based approach for coding sequences identification. I. Application to prokaryotic genomes.一种基于压缩的编码序列识别方法。I. 在原核生物基因组中的应用。

J Comput Biol. 2006 Oct;13(8):1477-88. doi: 10.1089/cmb.2006.13.1477.

10

An unsupervised classification scheme for improving predictions of prokaryotic TIS.一种用于改进原核生物翻译起始位点预测的无监督分类方案。

BMC Bioinformatics. 2006 Mar 9;7:121. doi: 10.1186/1471-2105-7-121.

引用本文的文献

1

Genome majority vote improves gene predictions.基因组多数表决提高基因预测准确性。

PLoS Comput Biol. 2011 Nov;7(11):e1002284. doi: 10.1371/journal.pcbi.1002284. Epub 2011 Nov 17.

2

Recent applications of Hidden Markov Models in computational biology.隐马尔可夫模型在计算生物学中的最新应用。

Genomics Proteomics Bioinformatics. 2004 May;2(2):84-96. doi: 10.1016/s1672-0229(04)02014-5.

3

Human-mouse gene identification by comparative evidence integration and evolutionary analysis.通过比较证据整合和进化分析进行人鼠基因鉴定。

Genome Res. 2003 Jun;13(6A):1190-202. doi: 10.1101/gr.703903. Epub 2003 May 12.

4

In vivo evidence for the prokaryotic model of extended codon-anticodon interaction in translation initiation.翻译起始中扩展密码子-反密码子相互作用的原核模型的体内证据。

EMBO J. 2003 Feb 3;22(3):651-6. doi: 10.1093/emboj/cdg072.

5

Identification and utilization of arbitrary correlations in models of recombination signal sequences.重组信号序列模型中任意相关性的识别与利用。

Genome Biol. 2002;3(12):RESEARCH0072. doi: 10.1186/gb-2002-3-12-research0072. Epub 2002 Nov 21.

本文引用的文献

1

Applications of generalized pair hidden Markov models to alignment and gene finding problems.广义配对隐马尔可夫模型在序列比对和基因查找问题中的应用。

J Comput Biol. 2002;9(2):389-99. doi: 10.1089/10665270252935520.

2

A Bayesian framework for combining gene predictions.一种用于整合基因预测的贝叶斯框架。

Bioinformatics. 2002 Jan;18(1):19-27. doi: 10.1093/bioinformatics/18.1.19.

3

A probabilistic method for identifying start codons in bacterial genomes.一种用于识别细菌基因组中起始密码子的概率方法。

Bioinformatics. 2001 Dec;17(12):1123-30. doi: 10.1093/bioinformatics/17.12.1123.

4

A novel bacterial gene-finding system with improved accuracy in locating start codons.一种在定位起始密码子方面具有更高准确性的新型细菌基因发现系统。

DNA Res. 2001 Jun 30;8(3):97-106. doi: 10.1093/dnares/8.3.97.

5

GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.GeneMarkS：一种用于预测微生物基因组中基因起始位点的自训练方法。对在调控区域中寻找序列基序的启示。

Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607.

6

Genome evolution at the genus level: comparison of three complete genomes of hyperthermophilic archaea.属水平上的基因组进化：嗜热古菌三个完整基因组的比较

Genome Res. 2001 Jun;11(6):981-93. doi: 10.1101/gr.gr1653r.

7

Computational inference of homologous gene structures in the human genome.人类基因组中同源基因结构的计算推断

Genome Res. 2001 May;11(5):803-16. doi: 10.1101/gr.175701.

8

Human and mouse gene structure: comparative analysis and application to exon prediction.人类和小鼠基因结构：比较分析及其在外显子预测中的应用。

Genome Res. 2000 Jul;10(7):950-8. doi: 10.1101/gr.10.7.950.

9

Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.通过“逐帧”算法寻找原核生物基因：靶向基因起始位点和重叠基因。

Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874.

10

EcoGene: a genome sequence database for Escherichia coli K-12.EcoGene：大肠杆菌K-12的基因组序列数据库。

Nucleic Acids Res. 2000 Jan 1;28(1):60-4. doi: 10.1093/nar/28.1.60.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。