Suppr超能文献

蒺藜苜蓿基因组的蛋白质基因组学调查。

A proteogenomic survey of the Medicago truncatula genome.

机构信息

Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.

出版信息

Mol Cell Proteomics. 2012 Oct;11(10):933-44. doi: 10.1074/mcp.M112.019471. Epub 2012 Jul 5.

Abstract

Peptide sequencing by computational assignment of tandem mass spectra to a database of putative protein sequences provides an independent approach to confirming or refuting protein predictions based on large-scale DNA and RNA sequencing efforts. This use of mass spectrometrically-derived sequence data for testing and refining predicted gene models has been termed proteogenomics. We report herein the application of proteogenomic methodology to a database of 10.9 million tandem mass spectra collected over a period of two years from proteolytically generated peptides isolated from the model legume Medicago truncatula. These spectra were searched against a database of predicted M. truncatula protein sequences generated from public databases, in silico gene model predictions, and a whole-genome six-frame translation. This search identified 78,647 distinct peptide sequences, and a comparison with the publicly available proteome from the recently published M. truncatula genome supported translation of 9,843 existing gene models and identified 1,568 novel peptides suggesting corrections or additions to the current annotations. Each supporting and novel peptide was independently validated using mRNA-derived deep sequencing coverage and an overall correlation of 93% between the two data types was observed. We have additionally highlighted examples of several aspects of structural annotation for which tandem MS provides unique evidence not easily obtainable through typical DNA or RNA sequencing. Proteogenomic analysis is a valuable and unique source of information for the structural annotation of genomes and should be included in such efforts to ensure that the genome models used by biologists mirror as accurately as possible what is present in the cell.

摘要

通过将串联质谱分配给假定蛋白质序列数据库来对肽进行测序,为基于大规模 DNA 和 RNA 测序工作的蛋白质预测的确认或反驳提供了一种独立的方法。这种使用质谱衍生的序列数据来测试和完善预测的基因模型的方法被称为蛋白质组学。我们在此报告了蛋白质组学方法在数据库中的应用,该数据库包含了两年间从模式豆科植物蒺藜苜蓿中分离的蛋白水解肽产生的 1090 万个串联质谱。这些光谱与从公共数据库、计算机基因模型预测和全基因组六框翻译中生成的预测 M. truncatula 蛋白质序列数据库进行了搜索。该搜索确定了 78647 个独特的肽序列,与最近发表的 M. truncatula 基因组中公开的蛋白质组进行比较,支持了 9843 个现有基因模型的翻译,并鉴定了 1568 个新肽,提示对当前注释进行更正或添加。每个支持肽和新肽都使用 mRNA 衍生的深度测序覆盖率进行了独立验证,两种数据类型之间的总体相关性为 93%。我们还强调了串联 MS 提供独特证据的几个结构注释方面的示例,这些证据不易通过典型的 DNA 或 RNA 测序获得。蛋白质组学分析是基因组结构注释的有价值且独特的信息来源,应包含在这些努力中,以确保生物学家使用的基因组模型尽可能准确地反映细胞中存在的情况。

相似文献

1
A proteogenomic survey of the Medicago truncatula genome.蒺藜苜蓿基因组的蛋白质基因组学调查。
Mol Cell Proteomics. 2012 Oct;11(10):933-44. doi: 10.1074/mcp.M112.019471. Epub 2012 Jul 5.
3
MtSSPdb: The Small Secreted Peptide Database.MtSSPdb:小型分泌肽数据库。
Plant Physiol. 2020 May;183(1):399-413. doi: 10.1104/pp.19.01088. Epub 2020 Feb 20.
8
Proteogenomic Gene Structure Validation in the Pineapple Genome.菠萝基因组中的蛋白质基因组基因结构验证
J Proteome Res. 2024 May 3;23(5):1583-1592. doi: 10.1021/acs.jproteome.3c00675. Epub 2024 Apr 23.

引用本文的文献

3
Proteomics in Non-model Organisms: A New Analytical Frontier.非模式生物蛋白质组学:一个新的分析前沿领域。
J Proteome Res. 2020 Sep 4;19(9):3595-3606. doi: 10.1021/acs.jproteome.0c00448. Epub 2020 Aug 20.

本文引用的文献

9
Integrative genomics viewer.整合基因组浏览器。
Nat Biotechnol. 2011 Jan;29(1):24-6. doi: 10.1038/nbt.1754.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验