以结核分枝杆菌为例，高精度质谱分析作为一种验证和改进基因注释的工具。

High accuracy mass spectrometry analysis as a tool to verify and improve gene annotation using Mycobacterium tuberculosis as an example.

作者信息

de Souza Gustavo A, Målen Hiwa, Søfteland Tina, Saelensminde Gisle, Prasad Swati, Jonassen Inge, Wiker Harald G

机构信息

Section for Microbiology and Immunology, The Gade Institute, University of Bergen, Bergen, Norway.

出版信息

BMC Genomics. 2008 Jul 2;9:316. doi: 10.1186/1471-2164-9-316.

DOI:10.1186/1471-2164-9-316

PMID:18597682

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2483986/

Abstract

BACKGROUND

While the genomic annotations of diverse lineages of the Mycobacterium tuberculosis complex are available, divergences between gene prediction methods are still a challenge for unbiased protein dataset generation. M. tuberculosis gene annotation is an example, where the most used datasets from two independent institutions (Sanger Institute and Institute of Genomic Research-TIGR) differ up to 12% in the number of annotated open reading frames, and 46% of the genes contained in both annotations have different start codons. Such differences emphasize the importance of the identification of the sequence of protein products to validate each gene annotation including its sequence coding area.

RESULTS

With this objective, we submitted a culture filtrate sample from M. tuberculosis to a high-accuracy LTQ-Orbitrap mass spectrometer analysis and applied refined N-terminal prediction to perform comparison of two gene annotations. From a total of 449 proteins identified from the MS data, we validated 35 tryptic peptides that were specific to one of the two datasets, representing 24 different proteins. From those, 5 proteins were only annotated in the Sanger database. In the remaining proteins, the observed differences were due to differences in annotation of transcriptional start sites.

CONCLUSION

Our results indicate that, even in a less complex sample likely to represent only 10% of the bacterial proteome, we were still able to detect major differences between different gene annotation approaches. This gives hope that high-throughput proteomics techniques can be used to improve and validate gene annotations, and in particular for verification of high-throughput, automatic gene annotations.

摘要

背景

虽然结核分枝杆菌复合群不同谱系的基因组注释已有，但基因预测方法之间的差异仍是生成无偏差蛋白质数据集的一个挑战。结核分枝杆菌的基因注释就是一个例子，两个独立机构（桑格研究所和基因组研究所-TIGR）最常用的数据集在注释的开放阅读框数量上相差高达12%，且两个注释中包含的46%的基因具有不同的起始密码子。这些差异凸显了鉴定蛋白质产物序列以验证每个基因注释（包括其序列编码区）的重要性。

结果

出于这一目的，我们将一份结核分枝杆菌的培养滤液样本提交给高精度LTQ-轨道阱质谱仪分析，并应用改进的N端预测来比较两种基因注释。从质谱数据鉴定出的总共449种蛋白质中，我们验证了35种胰蛋白酶肽段，它们是两个数据集中某一个所特有的，代表24种不同的蛋白质。其中，有5种蛋白质仅在桑格数据库中有注释。在其余蛋白质中，观察到的差异是由于转录起始位点注释的不同。

结论

我们的结果表明，即使在一个可能仅代表细菌蛋白质组10%的不太复杂的样本中，我们仍能够检测到不同基因注释方法之间的主要差异。这让人们有希望利用高通量蛋白质组学技术来改进和验证基因注释，特别是用于验证高通量自动基因注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5ae/2483986/48b6650d93a7/1471-2164-9-316-1.jpg

相似文献

High accuracy mass spectrometry analysis as a tool to verify and improve gene annotation using Mycobacterium tuberculosis as an example.以结核分枝杆菌为例，高精度质谱分析作为一种验证和改进基因注释的工具。

BMC Genomics. 2008 Jul 2;9:316. doi: 10.1186/1471-2164-9-316.

Proteogenomic analysis of polymorphisms and gene annotation divergences in prokaryotes using a clustered mass spectrometry-friendly database.使用聚类质谱友好型数据库对原核生物中的多态性和基因注释差异进行蛋白质基因组分析。

Mol Cell Proteomics. 2011 Jan;10(1):M110.002527. doi: 10.1074/mcp.M110.002527. Epub 2010 Oct 28.

Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.基于高分辨率质谱的结核分枝杆菌蛋白质组学分析。

Mol Cell Proteomics. 2011 Dec;10(12):M111.011627. doi: 10.1074/mcp.M111.011445. Epub 2011 Oct 3.

Validating divergent ORF annotation of the Mycobacterium leprae genome through a full translation data set and peptide identification by tandem mass spectrometry.通过完整翻译数据集和串联质谱法进行肽段鉴定来验证麻风分枝杆菌基因组中不同开放阅读框的注释。

Proteomics. 2009 Jun;9(12):3233-43. doi: 10.1002/pmic.200800955.

Proteogenomic analysis of Mycobacterium tuberculosis Beijing B0/W148 cluster strains.结核分枝杆菌北京 B0/W148 群菌株的蛋白质基因组分析。

J Proteomics. 2019 Feb 10;192:18-26. doi: 10.1016/j.jprot.2018.07.002. Epub 2018 Jul 24.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Reannotation of translational start sites in the genome of Mycobacterium tuberculosis.结核分枝杆菌基因组中转录起始位点的重新注释。

Tuberculosis (Edinb). 2013 Jan;93(1):18-25. doi: 10.1016/j.tube.2012.11.012. Epub 2012 Dec 26.

Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation.翻译后修饰的全蛋白质组分析：质谱技术在蛋白质基因组注释中的应用

Genome Res. 2007 Sep;17(9):1362-77. doi: 10.1101/gr.6427907. Epub 2007 Aug 9.

High-throughput proteogenomics of Ruegeria pomeroyi: seeding a better genomic annotation for the whole marine Roseobacter clade. Ruegeria pomeroyi 的高通量蛋白基因组学研究：为整个海洋 Roseobacter 分支提供更好的基因组注释。

BMC Genomics. 2012 Feb 15;13:73. doi: 10.1186/1471-2164-13-73.

Experimental determination of translational starts using peptide mass mapping and tandem mass spectrometry within the proteome of Mycobacterium tuberculosis.利用肽质量图谱和串联质谱对结核分枝杆菌蛋白质组中的翻译起始位点进行实验测定。

Microbiology (Reading). 2007 Feb;153(Pt 2):521-528. doi: 10.1099/mic.0.2006/001537-0.

引用本文的文献

The PeptideAtlas of a widely cultivated fish Labeo rohita: A resource for the Aquaculture Community.广泛养殖鱼类罗非鱼的肽图集：水产养殖社区的资源。

Sci Data. 2022 Apr 13;9(1):171. doi: 10.1038/s41597-022-01259-9.

On the Impact of the Pangenome and Annotation Discrepancies While Building Protein Sequence Databases for Bacteria Proteogenomics.关于构建细菌蛋白质基因组学蛋白质序列数据库时泛基因组和注释差异的影响

Front Microbiol. 2019 Jun 20;10:1410. doi: 10.3389/fmicb.2019.01410. eCollection 2019.

Comparative Proteomic Profiling of and the Thai Vaccine Strain Bacille Calmette-Guerin Tokyo172: Diverse Biomarker Candidates for Species Differentiation.与泰国卡介苗东京172株的蛋白质组比较分析：用于菌株区分的多种生物标志物候选物

J Glob Infect Dis. 2018 Oct-Dec;10(4):196-200. doi: 10.4103/jgid.jgid_149_17.

is protected from NADPH oxidase and LC3-associated phagocytosis by the LCP protein CpsA.该蛋白 CpsA 通过 LCP 保护自身免受 NADPH 氧化酶和 LC3 相关的吞噬作用的影响。

Proc Natl Acad Sci U S A. 2017 Oct 10;114(41):E8711-E8720. doi: 10.1073/pnas.1707792114. Epub 2017 Sep 27.

A Multistage Subunit Vaccine Effectively Protects Mice Against Primary Progressive Tuberculosis, Latency and Reactivation.一种多阶段亚单位疫苗能有效保护小鼠免受原发性进行性肺结核、潜伏和再激活。

EBioMedicine. 2017 Aug;22:143-154. doi: 10.1016/j.ebiom.2017.07.005. Epub 2017 Jul 8.

Proteomics for the Investigation of Mycobacteria.用于分枝杆菌研究的蛋白质组学

Acta Naturae. 2017 Jan-Mar;9(1):15-25.

SpectroGene: A Tool for Proteogenomic Annotations Using Top-Down Spectra.光谱基因：一种利用自上而下光谱进行蛋白质基因组注释的工具。

J Proteome Res. 2016 Jan 4;15(1):144-51. doi: 10.1021/acs.jproteome.5b00610. Epub 2015 Dec 17.

A note on the false discovery rate of novel peptides in proteogenomics.关于蛋白质基因组学中新型肽段错误发现率的一则注释

Bioinformatics. 2015 Oct 15;31(20):3249-53. doi: 10.1093/bioinformatics/btv340. Epub 2015 Jun 14.

Differential in vivo expression of mycobacterial antigens in Mycobacterium tuberculosis infected lungs and lymph node tissues.结核分枝杆菌感染肺及淋巴结组织中分支杆菌抗原的体内差异表达。

BMC Infect Dis. 2014 Oct 3;14:535. doi: 10.1186/1471-2334-14-535.

Proteogenomic analysis of pathogenic yeast Cryptococcus neoformans using high resolution mass spectrometry.采用高分辨率质谱技术对致病性酵母新生隐球菌进行蛋白质基因组分析。

Clin Proteomics. 2014 Feb 3;11(1):5. doi: 10.1186/1559-0275-11-5.

本文引用的文献

Mass spectrometry-based prokaryote gene annotation.基于质谱法的原核生物基因注释

Proteomics. 2007 Nov;7(22):4053-65. doi: 10.1002/pmic.200700080.

A mass spectrometry-friendly database for cSNP identification.一个用于识别cSNP的质谱友好型数据库。

Nat Methods. 2007 Jun;4(6):465-6. doi: 10.1038/nmeth0607-465.

Comprehensive analysis of exported proteins from Mycobacterium tuberculosis H37Rv.结核分枝杆菌H37Rv分泌蛋白的综合分析

Proteomics. 2007 May;7(10):1702-18. doi: 10.1002/pmic.200600853.

Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?耻垢分枝杆菌中的中断编码序列：是真实突变还是测序错误？

Genome Biol. 2007;8(2):R20. doi: 10.1186/gb-2007-8-2-r20.

Microbiology (Reading). 2007 Feb;153(Pt 2):521-528. doi: 10.1099/mic.0.2006/001537-0.

Improving gene annotation using peptide mass spectrometry.利用肽质谱法改进基因注释

Genome Res. 2007 Feb;17(2):231-9. doi: 10.1101/gr.5646507. Epub 2006 Dec 22.

Dynamic range of mass accuracy in LTQ Orbitrap hybrid mass spectrometer.LTQ Orbitrap混合质谱仪中质量精度的动态范围。

J Am Soc Mass Spectrom. 2006 Jul;17(7):977-982. doi: 10.1016/j.jasms.2006.03.006. Epub 2006 Jun 5.

ICDS database: interrupted CoDing sequences in prokaryotic genomes.ICDS数据库：原核生物基因组中的中断编码序列

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D338-43. doi: 10.1093/nar/gkj060.

Genome annotation past, present, and future: how to define an ORF at each locus.基因组注释的过去、现在与未来：如何在每个基因座定义一个开放阅读框。

Genome Res. 2005 Dec;15(12):1777-86. doi: 10.1101/gr.3866105.

Analysing the outer membrane subproteome of Methylococcus capsulatus (Bath) using proteomics and novel biocomputing tools.使用蛋白质组学和新型生物计算工具分析荚膜甲基球菌（巴斯德菌株）的外膜亚蛋白质组。

Arch Microbiol. 2006 Feb;184(6):362-77. doi: 10.1007/s00203-005-0055-7. Epub 2005 Nov 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

以结核分枝杆菌为例，高精度质谱分析作为一种验证和改进基因注释的工具。

High accuracy mass spectrometry analysis as a tool to verify and improve gene annotation using Mycobacterium tuberculosis as an example.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献