Hon-yaku：一种用于识别原核生物翻译起始位点的生物学驱动的贝叶斯方法。

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes.

作者信息

Makita Yuko, de Hoon Michiel J L, Danchin Antoine

机构信息

Unit of Genetics of Bacterial Genomes, Institut Pasteur, URA CNRS 2171, Cedex 15, Paris, France.

出版信息

BMC Bioinformatics. 2007 Feb 8;8:47. doi: 10.1186/1471-2105-8-47.

DOI:10.1186/1471-2105-8-47

PMID:17286872

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1805508/

Abstract

BACKGROUND

Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) are important not only for the annotation of unknown proteins but also for the prediction of operons, promoters, and small non-coding RNA genes, as this typically makes use of the intergenic distance. A further problem is that most existing methods are optimized for Escherichia coli data sets; applying these methods to newly sequenced bacterial genomes may not result in an equivalent level of accuracy.

RESULTS

Based on a biological representation of the translation process, we applied Bayesian statistics to create a score function for predicting translation initiation sites. In contrast to existing programs, our combination of methods uses supervised learning to optimally use the set of known translation initiation sites. We combined the Ribosome Binding Site (RBS) sequence, the distance between the translation initiation site and the RBS sequence, the base composition of the start codon, the nucleotide composition (A-rich sequences) following start codons, and the expected distribution of the protein length in a Bayesian scoring function. To further increase the prediction accuracy, we also took into account the operon orientation. The outcome of the procedure achieved a prediction accuracy of 93.2% in 858 E. coli genes from the EcoGene data set and 92.7% accuracy in a data set of 1243 Bacillus subtilis 'non-y' genes. We confirmed the performance in the GC-rich Gamma-Proteobacteria Herminiimonas arsenicoxydans, Pseudomonas aeruginosa, and Burkholderia pseudomallei K96243.

CONCLUSION

Hon-yaku, being based on a careful choice of elements important in translation, improved the prediction accuracy in B. subtilis data sets and other bacteria except for E. coli. We believe that most remaining mispredictions are due to atypical ribosomal binding sequences used in specific translation control processes, or likely errors in the training data sets.

摘要

背景

目前计算预测方法用于识别原核生物基因组中的基因。然而，确定正确的翻译起始位点仍然是一项艰巨的任务。准确的翻译起始位点不仅对于未知蛋白质的注释很重要，而且对于操纵子、启动子和小非编码RNA基因的预测也很重要，因为这通常利用基因间距离。另一个问题是，大多数现有方法是针对大肠杆菌数据集进行优化的；将这些方法应用于新测序的细菌基因组可能不会产生同等水平的准确性。

结果

基于翻译过程的生物学表示，我们应用贝叶斯统计创建了一个用于预测翻译起始位点的评分函数。与现有程序不同，我们的方法组合使用监督学习来最佳地利用已知翻译起始位点集。我们将核糖体结合位点（RBS）序列、翻译起始位点与RBS序列之间的距离、起始密码子的碱基组成、起始密码子后的核苷酸组成（富含A的序列）以及蛋白质长度在贝叶斯评分函数中的预期分布结合起来。为了进一步提高预测准确性，我们还考虑了操纵子方向。该程序的结果在来自EcoGene数据集的858个大肠杆菌基因中预测准确率达到93.2%，在1243个枯草芽孢杆菌“非y”基因的数据集中准确率达到92.7%。我们在富含GC的γ-变形菌嗜砷赫尔曼氏菌、铜绿假单胞菌和类鼻疽伯克霍尔德氏菌K96243中证实了该性能。

结论

Hon-yaku基于对翻译中重要元素的精心选择，提高了枯草芽孢杆菌数据集和除大肠杆菌外的其他细菌的预测准确性。我们认为，大多数剩余的错误预测是由于特定翻译控制过程中使用的非典型核糖体结合序列，或者可能是训练数据集中的错误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/529c/1805508/73947ce0084a/1471-2105-8-47-1.jpg

相似文献

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes.

BMC Bioinformatics. 2007 Feb 8;8:47. doi: 10.1186/1471-2105-8-47.

Accuracy improvement for identifying translation initiation sites in microbial genomes.

Bioinformatics. 2004 Dec 12;20(18):3308-17. doi: 10.1093/bioinformatics/bth390. Epub 2004 Jul 9.

Identifying translation initiation sites in prokaryotes using support vector machine.

J Theor Biol. 2010 Feb 21;262(4):644-9. doi: 10.1016/j.jtbi.2009.10.023. Epub 2009 Oct 17.

GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607.

MetWAMer: eukaryotic translation initiation site prediction.

BMC Bioinformatics. 2008 Sep 18;9:381. doi: 10.1186/1471-2105-9-381.

MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.

BMC Bioinformatics. 2007 Mar 16;8:97. doi: 10.1186/1471-2105-8-97.

Computational evaluation of TIS annotation for prokaryotic genomes.

BMC Bioinformatics. 2008 Mar 25;9:160. doi: 10.1186/1471-2105-9-160.

An unsupervised classification scheme for improving predictions of prokaryotic TIS.

BMC Bioinformatics. 2006 Mar 9;7:121. doi: 10.1186/1471-2105-7-121.

Improved prediction of bacterial transcription start sites.

Bioinformatics. 2006 Jan 15;22(2):142-8. doi: 10.1093/bioinformatics/bti771. Epub 2005 Nov 15.

A symbolic-numeric approach to find patterns in genomes. Application to the translation initiation sites of E. coli.

Biochimie. 1999 Nov;81(11):1065-72. doi: 10.1016/s0300-9084(99)00328-4.

引用本文的文献

Identification of Translation Start Sites in Bacterial Genomes.

Methods Mol Biol. 2021;2252:27-55. doi: 10.1007/978-1-0716-1150-0_2.

Retapamulin-Assisted Ribosome Profiling Reveals the Alternative Bacterial Proteome.

Mol Cell. 2019 May 2;74(3):481-493.e6. doi: 10.1016/j.molcel.2019.02.017. Epub 2019 Mar 20.

No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects.

Microb Biotechnol. 2018 Jul;11(4):588-605. doi: 10.1111/1751-7915.13284. Epub 2018 May 28.

Gene prediction in metagenomic fragments based on the SVM algorithm.

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S12. doi: 10.1186/1471-2105-14-S5-S12. Epub 2013 Apr 10.

Re-annotation of two hyperthermophilic archaea Pyrococcus abyssi GE5 and Pyrococcus furiosus DSM 3638.

Curr Microbiol. 2012 Feb;64(2):118-29. doi: 10.1007/s00284-011-0035-x. Epub 2011 Nov 6.

Genome reannotation of Escherichia coli CFT073 with new insights into virulence.

BMC Genomics. 2009 Nov 22;10:552. doi: 10.1186/1471-2164-10-552.

The Genome Reverse Compiler: an explorative annotation tool.

BMC Bioinformatics. 2009 Jan 27;10:35. doi: 10.1186/1471-2105-10-35.

Experimental determination of translational start sites resolves uncertainties in genomic open reading frame predictions - application to Mycobacterium tuberculosis.

Microbiology (Reading). 2009 Jan;155(Pt 1):186-197. doi: 10.1099/mic.0.022889-0.

Phylogenetic and evolutionary relationships of RubisCO and the RubisCO-like proteins and the functional lessons provided by diverse molecular forms.

Philos Trans R Soc Lond B Biol Sci. 2008 Aug 27;363(1504):2629-40. doi: 10.1098/rstb.2008.0023.

ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes.

Nucleic Acids Res. 2008 Jan;36(Database issue):D114-9. doi: 10.1093/nar/gkm799. Epub 2007 Oct 16.

本文引用的文献

The highly efficient translation initiation region from the Escherichia coli rpsA gene lacks a shine-dalgarno element.

J Bacteriol. 2006 Sep;188(17):6277-85. doi: 10.1128/JB.00591-06.

An unsupervised classification scheme for improving predictions of prokaryotic TIS.

BMC Bioinformatics. 2006 Mar 9;7:121. doi: 10.1186/1471-2105-7-121.

Regulation of translation via mRNA structure in prokaryotes and eukaryotes.

Gene. 2005 Nov 21;361:13-37. doi: 10.1016/j.gene.2005.06.037. Epub 2005 Oct 5.

A computational method to predict genetically encoded rare amino acids in proteins.

Genome Biol. 2005;6(9):R79. doi: 10.1186/gb-2005-6-9-r79. Epub 2005 Aug 31.

How essential are nonessential genes?

Mol Biol Evol. 2005 Nov;22(11):2147-56. doi: 10.1093/molbev/msi211. Epub 2005 Jul 13.

Initiation of protein synthesis in bacteria.

Microbiol Mol Biol Rev. 2005 Mar;69(1):101-23. doi: 10.1128/MMBR.69.1.101-123.2005.

Accuracy improvement for identifying translation initiation sites in microbial genomes.

Bioinformatics. 2004 Dec 12;20(18):3308-17. doi: 10.1093/bioinformatics/bth390. Epub 2004 Jul 9.

Enhancement of translation initiation by A/T-rich sequences downstream of the initiation codon in Escherichia coli.

J Mol Microbiol Biotechnol. 2003;6(3-4):133-44. doi: 10.1159/000077244.

GS-Finder: a program to find bacterial gene start sites with a self-training method.

Int J Biochem Cell Biol. 2004 Mar;36(3):535-44. doi: 10.1016/j.biocel.2003.08.013.

The Pfam protein families database.

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41. doi: 10.1093/nar/gkh121.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Hon-yaku：一种用于识别原核生物翻译起始位点的生物学驱动的贝叶斯方法。

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献