结核分枝杆菌基因组中转录起始位点的重新注释。

Reannotation of translational start sites in the genome of Mycobacterium tuberculosis.

机构信息

Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA.

出版信息

Tuberculosis (Edinb). 2013 Jan;93(1):18-25. doi: 10.1016/j.tube.2012.11.012. Epub 2012 Dec 26.

DOI:10.1016/j.tube.2012.11.012

PMID:23273318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3582765/

Abstract

Identification and correction of incorrect ORF start sites is important for a variety of experimental and analytical purposes, ranging from cloning to inference of operon structure. The genome of the H37Rv reference strain of Mycobacterium tuberculosis (Mtb) was originally annotated when it was first sequenced nearly 15 years ago. While this annotation has served the TB research community well as a standard of reference for over a decade, it has been demonstrated experimentally that the actual start sites for an estimated 5-10% of open reading frames differ from the annotation. In this paper, we present a comprehensive bioinformatic analysis of all 3989 ORFs (open reading frames) in the M. tuberculosis H37Rv genome. Our method combines information from comparative analysis (alignment to start sites of orthologs in other Actinobacteria), sequence conservation, "protein likeness", putative ribosome binding sites, and other data to identify translational start sites. The features are combined in a linear model that is trained on dataset of known start sites verified by mass spectrometry, with a cross-validated accuracy of 94%. The method can be viewed as an augmentation of Hidden Markov Model-based tools such as Glimmer and GeneMark by incorporating more information than just the raw genomic sequence to decide which position is the legitimate translational start site for each ORF. Using this analysis, we identify 269 genes that most likely need to be re-annotated, and identify the best alterative translational start site for each. These revised ORF definitions could be used in the reannotation of the H37Rv genome, as well as to prioritize genes for experimental start-site validation.

摘要

鉴定和纠正不正确的 ORF 起始位点对于各种实验和分析目的都很重要，从克隆到操纵子结构的推断。结核分枝杆菌（Mtb）H37Rv 参考菌株的基因组最初是在大约 15 年前首次测序时注释的。虽然在过去十年中，该注释作为 TB 研究社区的参考标准很好地服务了研究社区，但实验表明，估计有 5-10%的开放阅读框的实际起始位点与注释不同。在本文中，我们对结核分枝杆菌 H37Rv 基因组中的所有 3989 个 ORF（开放阅读框）进行了全面的生物信息学分析。我们的方法结合了比较分析（与其他放线菌的起始位点进行比对）、序列保守性、“蛋白质相似性”、假定的核糖体结合位点以及其他数据的信息，以鉴定翻译起始位点。这些特征结合在一个线性模型中，该模型是基于通过质谱验证的已知起始位点数据集进行训练的，交叉验证准确率为 94%。该方法可以看作是对基于隐马尔可夫模型的工具（如 Glimmer 和 GeneMark）的扩展，因为它不仅结合了原始基因组序列，还结合了更多信息来决定每个 ORF 的合法翻译起始位点。使用这种分析，我们确定了 269 个最有可能需要重新注释的基因，并为每个基因确定了最佳的替代翻译起始位点。这些修订后的 ORF 定义可用于 H37Rv 基因组的重新注释，以及对实验起始位点验证的基因进行优先级排序。

相似文献

Reannotation of translational start sites in the genome of Mycobacterium tuberculosis.

Tuberculosis (Edinb). 2013 Jan;93(1):18-25. doi: 10.1016/j.tube.2012.11.012. Epub 2012 Dec 26.

Experimental determination of translational start sites resolves uncertainties in genomic open reading frame predictions - application to Mycobacterium tuberculosis.

Microbiology (Reading). 2009 Jan;155(Pt 1):186-197. doi: 10.1099/mic.0.022889-0.

Proteogenomic analysis of polymorphisms and gene annotation divergences in prokaryotes using a clustered mass spectrometry-friendly database.

Mol Cell Proteomics. 2011 Jan;10(1):M110.002527. doi: 10.1074/mcp.M110.002527. Epub 2010 Oct 28.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Nucleic Acids Res. 2001 Jun 15;29(12):2607-18. doi: 10.1093/nar/29.12.2607.

Identification of Translation Start Sites in Bacterial Genomes.

Methods Mol Biol. 2021;2252:27-55. doi: 10.1007/978-1-0716-1150-0_2.

Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.

Mol Cell Proteomics. 2011 Dec;10(12):M111.011627. doi: 10.1074/mcp.M111.011445. Epub 2011 Oct 3.

Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.

Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874.

Reduce manual curation by combining gene predictions from multiple annotation engines, a case study of start codon prediction.

PLoS One. 2013 May 10;8(5):e63523. doi: 10.1371/journal.pone.0063523. Print 2013.

REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes.

Nucleic Acids Res. 2017 Nov 16;45(20):e168. doi: 10.1093/nar/gkx758.

引用本文的文献

Large-scale proteogenomics characterization of microproteins in Mycobacterium tuberculosis.

Sci Rep. 2024 Dec 28;14(1):31186. doi: 10.1038/s41598-024-82465-w.

A novel regulatory interplay between atypical B12 riboswitches and uORF translation in Mycobacterium tuberculosis.

Nucleic Acids Res. 2024 Jul 22;52(13):7876-7892. doi: 10.1093/nar/gkae338.

The rate and role of pseudogenes of the complex.

Microb Genom. 2022 Oct;8(10). doi: 10.1099/mgen.0.000876.

On the Occurrence and Multimerization of Two-Polypeptide Phage Endolysins Encoded in Single Genes.

Microbiol Spectr. 2022 Aug 31;10(4):e0103722. doi: 10.1128/spectrum.01037-22. Epub 2022 Jul 25.

Comparative genome characterization of Echinicola marina sp. nov., isolated from deep-sea sediment provide insight into carotenoid biosynthetic gene cluster evolution.

Sci Rep. 2021 Dec 17;11(1):24188. doi: 10.1038/s41598-021-03683-0.

BCG Moreau N-Terminal Loss Leads to a Less Stable Dodecin With Lower Flavin Binding Capacity.

Front Cell Infect Microbiol. 2021 Mar 31;11:658888. doi: 10.3389/fcimb.2021.658888. eCollection 2021.

Development and Optimization of Chromosomally-Integrated Fluorescent Reporter Constructs.

Front Microbiol. 2020 Dec 9;11:591866. doi: 10.3389/fmicb.2020.591866. eCollection 2020.

Structure and functional implications of WYL domain-containing bacterial DNA damage response regulator PafBC.

Nat Commun. 2019 Oct 11;10(1):4653. doi: 10.1038/s41467-019-12567-x.

AssessORF: combining evolutionary conservation and proteomics to assess prokaryotic gene predictions.

Bioinformatics. 2020 Feb 15;36(4):1022-1029. doi: 10.1093/bioinformatics/btz714.

Structural and functional insight into the Mycobacterium tuberculosis protein PrpR reveals a novel type of transcription factor.

Nucleic Acids Res. 2019 Oct 10;47(18):9934-9949. doi: 10.1093/nar/gkz724.

本文引用的文献

Analyses of MbtB, MbtE, and MbtF suggest revisions to the mycobactin biosynthesis pathway in Mycobacterium tuberculosis.

J Bacteriol. 2012 Jun;194(11):2809-18. doi: 10.1128/JB.00088-12. Epub 2012 Mar 23.

Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.

Mol Cell Proteomics. 2011 Dec;10(12):M111.011627. doi: 10.1074/mcp.M111.011445. Epub 2011 Oct 3.

Statistical analysis of unstructured amino acid residues in protein structures.

Biochemistry (Mosc). 2010 Feb;75(2):192-200. doi: 10.1134/s0006297910020094.

Evaluation of three automated genome annotations for Halorhabdus utahensis.

PLoS One. 2009 Jul 20;4(7):e6291. doi: 10.1371/journal.pone.0006291.

Experimental determination of translational start sites resolves uncertainties in genomic open reading frame predictions - application to Mycobacterium tuberculosis.

Microbiology (Reading). 2009 Jan;155(Pt 1):186-197. doi: 10.1099/mic.0.022889-0.

High accuracy mass spectrometry analysis as a tool to verify and improve gene annotation using Mycobacterium tuberculosis as an example.

BMC Genomics. 2008 Jul 2;9:316. doi: 10.1186/1471-2164-9-316.

Prokaryotic gene prediction using GeneMark and GeneMark.hmm.

Curr Protoc Bioinformatics. 2003 May;Chapter 4:Unit4.5. doi: 10.1002/0471250953.bi0405s01.

The RAST Server: rapid annotations using subsystems technology.

BMC Genomics. 2008 Feb 8;9:75. doi: 10.1186/1471-2164-9-75.

Clustal W and Clustal X version 2.0.

Bioinformatics. 2007 Nov 1;23(21):2947-8. doi: 10.1093/bioinformatics/btm404. Epub 2007 Sep 10.

A predicted operon map for Mycobacterium tuberculosis.

Nucleic Acids Res. 2007;35(15):5085-95. doi: 10.1093/nar/gkm518. Epub 2007 Jul 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

结核分枝杆菌基因组中转录起始位点的重新注释。

Reannotation of translational start sites in the genome of Mycobacterium tuberculosis.

机构信息

Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA.

出版信息

Tuberculosis (Edinb). 2013 Jan;93(1):18-25. doi: 10.1016/j.tube.2012.11.012. Epub 2012 Dec 26.

DOI:10.1016/j.tube.2012.11.012

PMID:23273318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3582765/

Abstract

摘要

结核分枝杆菌基因组中转录起始位点的重新注释。

Reannotation of translational start sites in the genome of Mycobacterium tuberculosis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

结核分枝杆菌基因组中转录起始位点的重新注释。

Reannotation of translational start sites in the genome of Mycobacterium tuberculosis.

机构信息

出版信息