从生物信息学角度看蛋白质基因组学：一个不断发展的领域。

Proteogenomics from a bioinformatics angle: A growing field.

机构信息

Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Lab of Bioinformatics and Computational Genomics, Ghent University, Ghent, Belgium.

Department of Biochemistry and Molecular Pharmacology, Center for Health Informatics and Bioinformatics, New York University School of Medicine, New York, NY.

出版信息

Mass Spectrom Rev. 2017 Sep;36(5):584-599. doi: 10.1002/mas.21483. Epub 2015 Dec 15.

DOI:10.1002/mas.21483

PMID:26670565

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6101030/

Abstract

Proteogenomics is a research area that combines areas as proteomics and genomics in a multi-omics setup using both mass spectrometry and high-throughput sequencing technologies. Currently, the main goals of the field are to aid genome annotation or to unravel the proteome complexity. Mass spectrometry based identifications of matching or homologues peptides can further refine gene models. Also, the identification of novel proteoforms is also made possible based on detection of novel translation initiation sites (cognate or near-cognate), novel transcript isoforms, sequence variation or novel (small) open reading frames in intergenic or un-translated genic regions by analyzing high-throughput sequencing data from RNAseq or ribosome profiling experiments. Other proteogenomics studies using a combination of proteomics and genomics techniques focus on antibody sequencing, the identification of immunogenic peptides or venom peptides. Over the years, a growing amount of bioinformatics tools and databases became available to help streamlining these cross-omics studies. Some of these solutions only help in specific steps of the proteogenomics studies, e.g. building custom sequence databases (based on next generation sequencing output) for mass spectrometry fragmentation spectrum matching. Over the last few years a handful integrative tools also became available that can execute complete proteogenomics analyses. Some of these are presented as stand-alone solutions, whereas others are implemented in a web-based framework such as Galaxy. In this review we aimed at sketching a comprehensive overview of all the bioinformatics solutions that are available for this growing research area. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:584-599, 2017.

摘要

蛋白质基因组学是一个将蛋白质组学和基因组学领域结合起来的研究领域，在多组学设置中使用质谱和高通量测序技术。目前，该领域的主要目标是辅助基因组注释或揭示蛋白质组的复杂性。基于质谱的匹配或同源肽的鉴定可以进一步完善基因模型。此外，基于检测新的翻译起始位点（同源或近同源）、新的转录本异构体、序列变异或基因间或非翻译基因区域中新的（小）开放阅读框，通过分析 RNAseq 或核糖体分析实验的高通量测序数据，也可以鉴定新的蛋白质形式。其他使用蛋白质组学和基因组学技术组合的蛋白质基因组学研究侧重于抗体测序、免疫肽或毒液肽的鉴定。多年来，越来越多的生物信息学工具和数据库可用于帮助简化这些跨组学研究。其中一些解决方案仅有助于蛋白质基因组学研究的特定步骤，例如为质谱碎裂谱匹配构建定制的序列数据库（基于下一代测序输出）。在过去的几年中，也出现了一些综合工具，可以执行完整的蛋白质基因组学分析。其中一些是作为独立的解决方案提供的，而另一些则是在 Galaxy 等基于网络的框架中实现的。在这篇综述中，我们旨在全面概述所有可用于这一日益发展的研究领域的生物信息学解决方案。

相似文献

Proteogenomics from a bioinformatics angle: A growing field.从生物信息学角度看蛋白质基因组学：一个不断发展的领域。

Mass Spectrom Rev. 2017 Sep;36(5):584-599. doi: 10.1002/mas.21483. Epub 2015 Dec 15.

Proteogenomics: concepts, applications and computational strategies.蛋白质基因组学：概念、应用及计算策略

Nat Methods. 2014 Nov;11(11):1114-25. doi: 10.1038/nmeth.3144.

Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine.蛋白质基因组学：从下一代测序（NGS）和基于质谱的蛋白质组学到精准医学。

Clin Chim Acta. 2019 Nov;498:38-46. doi: 10.1016/j.cca.2019.08.010. Epub 2019 Aug 14.

PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration.蛋白质异构体分析器：通过核糖体分析和质谱整合实现深度蛋白质组覆盖

Nucleic Acids Res. 2015 Mar 11;43(5):e29. doi: 10.1093/nar/gku1283. Epub 2014 Dec 15.

Connecting Proteomics to Next-Generation Sequencing: Proteogenomics and Its Current Applications in Biology.将蛋白质组学与下一代测序技术相连接：蛋白质基因组学及其在生物学中的当前应用。

Proteomics. 2019 May;19(10):e1800235. doi: 10.1002/pmic.201800235. Epub 2018 Dec 11.

Mass spectrometry at the interface of proteomics and genomics.蛋白质组学与基因组学交叉领域的质谱分析

Mol Biosyst. 2011 Feb;7(2):284-91. doi: 10.1039/c0mb00168f. Epub 2010 Oct 21.

Multi-omics Visualization Platform: An extensible Galaxy plug-in for multi-omics data visualization and exploration.多组学可视化平台：一个可扩展的 Galaxy 插件，用于多组学数据的可视化和探索。

Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa025.

Identification of Novel Bacterial Microproteins Encoded by Small Open Reading Frames Using a Computational Proteogenomics Workflow.基于计算蛋白质组学工作流程鉴定由小开放阅读框编码的新型细菌微蛋白。

Methods Mol Biol. 2024;2836:19-34. doi: 10.1007/978-1-0716-4007-4_2.

A tool for integrating genetic and mass spectrometry-based peptide data: Proteogenomics Viewer: PV: A genome browser-like tool, which includes MS data visualization and peptide identification parameters.一种整合基于基因和质谱的肽段数据的工具：蛋白质基因组学查看器：PV：一种类似基因组浏览器的工具，包括质谱数据可视化和肽段鉴定参数。

Bioessays. 2017 Jul;39(7). doi: 10.1002/bies.201700015. Epub 2017 Jun 5.

Proteogenomics: Key Driver for Clinical Discovery and Personalized Medicine.蛋白质基因组学：临床发现与个性化医疗的关键驱动力

Adv Exp Med Biol. 2016;926:21-47. doi: 10.1007/978-3-319-42316-6_3.

引用本文的文献

An Overview of Advancements in Proteomic Approaches to Enhance Livestock Production and Aquaculture.用于提高畜牧生产和水产养殖的蛋白质组学方法进展概述

Animals (Basel). 2025 Jul 2;15(13):1946. doi: 10.3390/ani15131946.

Harnessing Noncanonical Proteins for Next-Generation Drug Discovery and Diagnosis.利用非规范蛋白质进行下一代药物发现与诊断。

WIREs Mech Dis. 2025 May-Jun;17(3):e70001. doi: 10.1002/wsbm.70001.

Pyruvate Kinase M1/2 Proteoformics for Accurate Insights into Energy Metabolism Abnormity to Promote the Overall Management of Ovarian Cancer Towards Predictive, Preventive, and Personalized Medicine Approaches.丙酮酸激酶M1/2蛋白质组学，用于准确洞察能量代谢异常，以促进卵巢癌的整体管理，迈向预测性、预防性和个性化医学方法。

Metabolites. 2025 Mar 16;15(3):203. doi: 10.3390/metabo15030203.

PepCentric Enables Fast Repository-Scale Proteogenomics Searches.PepCentric实现了快速的库规模蛋白质基因组学搜索。

bioRxiv. 2025 Feb 28:2025.02.24.639867. doi: 10.1101/2025.02.24.639867.

Investigating proteogenomic divergence in patient-derived xenograft models of ovarian cancer.研究卵巢癌患者来源异种移植模型中的蛋白质基因组差异。

Sci Rep. 2025 Jan 4;15(1):813. doi: 10.1038/s41598-024-84874-3.

ProHap enables human proteomic database generation accounting for population diversity.ProHap能够生成考虑群体多样性的人类蛋白质组数据库。

Nat Methods. 2025 Feb;22(2):273-277. doi: 10.1038/s41592-024-02506-0. Epub 2024 Dec 9.

Discovering Novel Proteoforms Using Proteogenomic Workflows Within the Galaxy Bioinformatics Platform.利用 Galaxy 生物信息学平台中的蛋白质基因组工作流程发现新型蛋白异构体。

Methods Mol Biol. 2025;2859:109-128. doi: 10.1007/978-1-0716-4152-1_7.

From Gene to Whole Cell: Modeling, Visualization, and Analysis.从基因到整个细胞：建模、可视化和分析。

Methods Mol Biol. 2025;2859:65-92. doi: 10.1007/978-1-0716-4152-1_5.

Phenotyping Tumor Heterogeneity through Proteogenomics: Study Models and Challenges.通过蛋白质基因组学进行肿瘤异质性表型分析：研究模型与挑战。

Int J Mol Sci. 2024 Aug 14;25(16):8830. doi: 10.3390/ijms25168830.

Unveiling the power of proteomics in advancing tropical animal health and production.揭示蛋白质组学在促进热带动物健康和生产中的作用。

Trop Anim Health Prod. 2024 Jun 3;56(5):182. doi: 10.1007/s11250-024-04037-4.

本文引用的文献

Tumor neoantigens: building a framework for personalized cancer immunotherapy.肿瘤新抗原：构建个性化癌症免疫治疗的框架

J Clin Invest. 2015 Sep;125(9):3413-21. doi: 10.1172/JCI80008. Epub 2015 Aug 10.

An open-source computational and data resource to analyze digital maps of immunopeptidomes.一种用于分析免疫肽组数字图谱的开源计算和数据资源。

Elife. 2015 Jul 8;4:e07661. doi: 10.7554/eLife.07661.

Insights into the origins of fish hunting in venomous cone snails from studies of Conus tessulatus.通过对花斑芋螺的研究深入了解有毒芋螺捕食鱼类的起源。

Proc Natl Acad Sci U S A. 2015 Apr 21;112(16):5087-92. doi: 10.1073/pnas.1424435112. Epub 2015 Apr 6.

Neoantigens in cancer immunotherapy.肿瘤免疫治疗中的新生抗原

Science. 2015 Apr 3;348(6230):69-74. doi: 10.1126/science.aaa4971.

CAPER 3.0: A Scalable Cloud-Based System for Data-Intensive Analysis of Chromosome-Centric Human Proteome Project Data Sets.CAPER 3.0：一个用于对以染色体为中心的人类蛋白质组计划数据集进行数据密集型分析的可扩展的基于云的系统。

J Proteome Res. 2015 Sep 4;14(9):3720-8. doi: 10.1021/pr501335w. Epub 2015 Mar 27.

PGTools: A Software Suite for Proteogenomic Data Analysis and Visualization.PGTools：用于蛋白质基因组数据分析与可视化的软件套件。

J Proteome Res. 2015 May 1;14(5):2255-66. doi: 10.1021/acs.jproteome.5b00029. Epub 2015 Apr 17.

Using REDItools to Detect RNA Editing Events in NGS Datasets.使用REDItools在NGS数据集中检测RNA编辑事件。

Curr Protoc Bioinformatics. 2015 Mar 9;49:12.12.1-12.12.15. doi: 10.1002/0471250953.bi1212s49.

Genome sequence-independent identification of RNA editing sites.不依赖基因组序列的RNA编辑位点鉴定

Nat Methods. 2015 Apr;12(4):347-50. doi: 10.1038/nmeth.3314. Epub 2015 Mar 2.

A decoy-free approach to the identification of peptides.一种用于鉴定肽段的无诱饵方法。

J Proteome Res. 2015 Apr 3;14(4):1792-8. doi: 10.1021/pr501164r. Epub 2015 Mar 6.

Multi-omic data analysis using Galaxy.使用Galaxy进行多组学数据分析。

Nat Biotechnol. 2015 Feb;33(2):137-9. doi: 10.1038/nbt.3134.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验