蒺藜苜蓿基因组的蛋白质基因组学调查。

A proteogenomic survey of the Medicago truncatula genome.

机构信息

Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.

出版信息

Mol Cell Proteomics. 2012 Oct;11(10):933-44. doi: 10.1074/mcp.M112.019471. Epub 2012 Jul 5.

DOI:10.1074/mcp.M112.019471

PMID:22774004

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3494139/

Abstract

Peptide sequencing by computational assignment of tandem mass spectra to a database of putative protein sequences provides an independent approach to confirming or refuting protein predictions based on large-scale DNA and RNA sequencing efforts. This use of mass spectrometrically-derived sequence data for testing and refining predicted gene models has been termed proteogenomics. We report herein the application of proteogenomic methodology to a database of 10.9 million tandem mass spectra collected over a period of two years from proteolytically generated peptides isolated from the model legume Medicago truncatula. These spectra were searched against a database of predicted M. truncatula protein sequences generated from public databases, in silico gene model predictions, and a whole-genome six-frame translation. This search identified 78,647 distinct peptide sequences, and a comparison with the publicly available proteome from the recently published M. truncatula genome supported translation of 9,843 existing gene models and identified 1,568 novel peptides suggesting corrections or additions to the current annotations. Each supporting and novel peptide was independently validated using mRNA-derived deep sequencing coverage and an overall correlation of 93% between the two data types was observed. We have additionally highlighted examples of several aspects of structural annotation for which tandem MS provides unique evidence not easily obtainable through typical DNA or RNA sequencing. Proteogenomic analysis is a valuable and unique source of information for the structural annotation of genomes and should be included in such efforts to ensure that the genome models used by biologists mirror as accurately as possible what is present in the cell.

摘要

通过将串联质谱分配给假定蛋白质序列数据库来对肽进行测序，为基于大规模 DNA 和 RNA 测序工作的蛋白质预测的确认或反驳提供了一种独立的方法。这种使用质谱衍生的序列数据来测试和完善预测的基因模型的方法被称为蛋白质组学。我们在此报告了蛋白质组学方法在数据库中的应用，该数据库包含了两年间从模式豆科植物蒺藜苜蓿中分离的蛋白水解肽产生的 1090 万个串联质谱。这些光谱与从公共数据库、计算机基因模型预测和全基因组六框翻译中生成的预测 M. truncatula 蛋白质序列数据库进行了搜索。该搜索确定了 78647 个独特的肽序列，与最近发表的 M. truncatula 基因组中公开的蛋白质组进行比较，支持了 9843 个现有基因模型的翻译，并鉴定了 1568 个新肽，提示对当前注释进行更正或添加。每个支持肽和新肽都使用 mRNA 衍生的深度测序覆盖率进行了独立验证，两种数据类型之间的总体相关性为 93%。我们还强调了串联 MS 提供独特证据的几个结构注释方面的示例，这些证据不易通过典型的 DNA 或 RNA 测序获得。蛋白质组学分析是基因组结构注释的有价值且独特的信息来源，应包含在这些努力中，以确保生物学家使用的基因组模型尽可能准确地反映细胞中存在的情况。

相似文献

A proteogenomic survey of the Medicago truncatula genome.蒺藜苜蓿基因组的蛋白质基因组学调查。

Mol Cell Proteomics. 2012 Oct;11(10):933-44. doi: 10.1074/mcp.M112.019471. Epub 2012 Jul 5.

The Medicago truncatula small protein proteome and peptidome.蒺藜苜蓿小蛋白质组和肽组

J Proteome Res. 2006 Dec;5(12):3355-67. doi: 10.1021/pr060336t.

MtSSPdb: The Small Secreted Peptide Database.MtSSPdb：小型分泌肽数据库。

Plant Physiol. 2020 May;183(1):399-413. doi: 10.1104/pp.19.01088. Epub 2020 Feb 20.

Concerted action of the new Genomic Peptide Finder and AUGUSTUS allows for automated proteogenomic annotation of the Chlamydomonas reinhardtii genome.新型基因组肽发现器与 AUGUSTUS 的协同作用可实现莱茵衣藻基因组的自动化蛋白基因组注释。

Proteomics. 2011 May;11(9):1814-23. doi: 10.1002/pmic.201000621. Epub 2011 Mar 22.

Tissue-specific Proteogenomic Analysis of Plutella xylostella Larval Midgut Using a Multialgorithm Pipeline.使用多算法流程对小菜蛾幼虫中肠进行组织特异性蛋白质基因组分析

Mol Cell Proteomics. 2016 Jun;15(6):1791-807. doi: 10.1074/mcp.M115.050989. Epub 2016 Feb 22.

Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics.通过蛋白质组学鉴定幽门螺杆菌 26695 株的新蛋白编码序列和信号肽切割位点。

J Proteomics. 2013 Jun 28;86:27-42. doi: 10.1016/j.jprot.2013.04.036. Epub 2013 May 9.

Proteome reference maps of Medicago truncatula embryogenic cell cultures generated from single protoplasts.由单个原生质体产生的蒺藜苜蓿胚性细胞培养物的蛋白质组参考图谱。

Proteomics. 2004 Jul;4(7):1883-96. doi: 10.1002/pmic.200300803.

Proteogenomic Gene Structure Validation in the Pineapple Genome.菠萝基因组中的蛋白质基因组基因结构验证

J Proteome Res. 2024 May 3;23(5):1583-1592. doi: 10.1021/acs.jproteome.3c00675. Epub 2024 Apr 23.

A two-dimensional electrophoresis proteomic reference map and systematic identification of 1367 proteins from a cell suspension culture of the model legume Medicago truncatula.二维电泳蛋白质组参考图谱以及从模式豆科植物蒺藜苜蓿的细胞悬浮培养物中对1367种蛋白质的系统鉴定。

Mol Cell Proteomics. 2005 Nov;4(11):1812-25. doi: 10.1074/mcp.D500005-MCP200. Epub 2005 Jul 26.

Database construction and peptide identification strategies for proteogenomic studies on sequenced genomes.基于测序基因组的蛋白质基因组学研究的数据库构建和肽鉴定策略。

Curr Top Med Chem. 2014;14(3):425-34. doi: 10.2174/1568026613666131204105652.

引用本文的文献

Functional Genomics of Legumes in Bulgaria-Advances and Future Perspectives.保加利亚豆类植物的功能基因组学——进展与未来展望

Genes (Basel). 2025 Feb 28;16(3):296. doi: 10.3390/genes16030296.

Nematode gene annotation by machine-learning-assisted proteotranscriptomics enables proteome-wide evolutionary analysis.基于机器学习辅助的蛋白质组转录组学进行线虫基因注释，可实现全蛋白质组范围的进化分析。

Genome Res. 2023 Jan;33(1):112-128. doi: 10.1101/gr.277070.122. Epub 2023 Jan 18.

Proteomics in Non-model Organisms: A New Analytical Frontier.非模式生物蛋白质组学：一个新的分析前沿领域。

J Proteome Res. 2020 Sep 4;19(9):3595-3606. doi: 10.1021/acs.jproteome.0c00448. Epub 2020 Aug 20.

Full-Length Transcript-Based Proteogenomics of Rice Improves Its Genome and Proteome Annotation.基于全长转录本的水稻蛋白质组学研究提高了其基因组和蛋白质组注释。

Plant Physiol. 2020 Mar;182(3):1510-1526. doi: 10.1104/pp.19.00430. Epub 2019 Dec 19.

Codon usage and amino acid usage influence genes expression level.密码子使用和氨基酸使用会影响基因表达水平。

Genetica. 2018 Feb;146(1):53-63. doi: 10.1007/s10709-017-9996-4. Epub 2017 Oct 14.

A proteomic atlas of the legume Medicago truncatula and its nitrogen-fixing endosymbiont Sinorhizobium meliloti.豆科植物蒺藜苜蓿及其固氮内共生菌苜蓿中华根瘤菌的蛋白质组图谱。

Nat Biotechnol. 2016 Nov;34(11):1198-1205. doi: 10.1038/nbt.3681. Epub 2016 Oct 17.

Mass Spectrometric-Based Selected Reaction Monitoring of Protein Phosphorylation during Symbiotic Signaling in the Model Legume, Medicago truncatula.基于质谱的模式豆科植物蒺藜苜蓿共生信号传导过程中蛋白质磷酸化的选择反应监测

PLoS One. 2016 May 20;11(5):e0155460. doi: 10.1371/journal.pone.0155460. eCollection 2016.

Tissue-specific Proteogenomic Analysis of Plutella xylostella Larval Midgut Using a Multialgorithm Pipeline.使用多算法流程对小菜蛾幼虫中肠进行组织特异性蛋白质基因组分析

Mol Cell Proteomics. 2016 Jun;15(6):1791-807. doi: 10.1074/mcp.M115.050989. Epub 2016 Feb 22.

Dual use of peptide mass spectra: Protein atlas and genome annotation.肽质谱的双重用途：蛋白质图谱与基因组注释。

Curr Plant Biol. 2015 May 1;2:21-24. doi: 10.1016/j.cpb.2015.02.001. Epub 2015 Apr 13.

Proteogenomics from a bioinformatics angle: A growing field.从生物信息学角度看蛋白质基因组学：一个不断发展的领域。

Mass Spectrom Rev. 2017 Sep;36(5):584-599. doi: 10.1002/mas.21483. Epub 2015 Dec 15.

本文引用的文献

Comparative large scale characterization of plant versus mammal proteins reveals similar and idiosyncratic N-α-acetylation features.比较植物与哺乳动物蛋白的大规模特征分析揭示了相似和独特的 N-α-乙酰化特征。

Mol Cell Proteomics. 2012 Jun;11(6):M111.015131. doi: 10.1074/mcp.M111.015131. Epub 2012 Jan 5.

The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools.拟南芥信息资源（TAIR）：改进的基因注释和新工具。

Nucleic Acids Res. 2012 Jan;40(Database issue):D1202-10. doi: 10.1093/nar/gkr1090. Epub 2011 Dec 2.

The Medicago genome provides insight into the evolution of rhizobial symbioses.蒺藜苜蓿基因组为根瘤共生进化提供了线索。

Nature. 2011 Nov 16;480(7378):520-4. doi: 10.1038/nature10625.

Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes.鼠胚胎干细胞的核糖体图谱分析揭示了哺乳动物蛋白质组的复杂性和动态性。

Cell. 2011 Nov 11;147(4):789-802. doi: 10.1016/j.cell.2011.10.002. Epub 2011 Nov 3.

A non-canonical start codon in the Drosophila fragile X gene yields two functional isoforms.果蝇脆性 X 基因中的非规范起始密码子产生两种功能性同工型。

Neuroscience. 2011 May 5;181:48-66. doi: 10.1016/j.neuroscience.2011.02.029. Epub 2011 Feb 17.

COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA.COMPASS：一套用于 OMSSA 的搜索前和搜索后蛋白质组学软件工具。

Proteomics. 2011 Mar;11(6):1064-74. doi: 10.1002/pmic.201000616. Epub 2011 Feb 7.

Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences.鉴定人类编码序列中进化保守的非 AUG 起始的 N 端延伸。

Nucleic Acids Res. 2011 May;39(10):4220-34. doi: 10.1093/nar/gkr007. Epub 2011 Jan 25.

pep2pro: a new tool for comprehensive proteome data analysis to reveal information about organ-specific proteomes in Arabidopsis thaliana.pep2pro：一种用于全面蛋白质组数据分析的新工具，可揭示拟南芥器官特异性蛋白质组的信息。

Integr Biol (Camb). 2011 Mar;3(3):225-37. doi: 10.1039/c0ib00078g. Epub 2011 Jan 24.

Integrative genomics viewer.整合基因组浏览器。

Nat Biotechnol. 2011 Jan;29(1):24-6. doi: 10.1038/nbt.1754.

Noncanonical translation initiation of the Arabidopsis flowering time and alternative polyadenylation regulator FCA.拟南芥开花时间和可变多聚腺苷酸化调控因子 FCA 的非规范翻译起始。

Plant Cell. 2010 Nov;22(11):3764-77. doi: 10.1105/tpc.110.077990. Epub 2010 Nov 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验