Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Lab of Bioinformatics and Computational Genomics, Ghent University, Ghent, Belgium.
Department of Biochemistry and Molecular Pharmacology, Center for Health Informatics and Bioinformatics, New York University School of Medicine, New York, NY.
Mass Spectrom Rev. 2017 Sep;36(5):584-599. doi: 10.1002/mas.21483. Epub 2015 Dec 15.
Proteogenomics is a research area that combines areas as proteomics and genomics in a multi-omics setup using both mass spectrometry and high-throughput sequencing technologies. Currently, the main goals of the field are to aid genome annotation or to unravel the proteome complexity. Mass spectrometry based identifications of matching or homologues peptides can further refine gene models. Also, the identification of novel proteoforms is also made possible based on detection of novel translation initiation sites (cognate or near-cognate), novel transcript isoforms, sequence variation or novel (small) open reading frames in intergenic or un-translated genic regions by analyzing high-throughput sequencing data from RNAseq or ribosome profiling experiments. Other proteogenomics studies using a combination of proteomics and genomics techniques focus on antibody sequencing, the identification of immunogenic peptides or venom peptides. Over the years, a growing amount of bioinformatics tools and databases became available to help streamlining these cross-omics studies. Some of these solutions only help in specific steps of the proteogenomics studies, e.g. building custom sequence databases (based on next generation sequencing output) for mass spectrometry fragmentation spectrum matching. Over the last few years a handful integrative tools also became available that can execute complete proteogenomics analyses. Some of these are presented as stand-alone solutions, whereas others are implemented in a web-based framework such as Galaxy. In this review we aimed at sketching a comprehensive overview of all the bioinformatics solutions that are available for this growing research area. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:584-599, 2017.
蛋白质基因组学是一个将蛋白质组学和基因组学领域结合起来的研究领域,在多组学设置中使用质谱和高通量测序技术。目前,该领域的主要目标是辅助基因组注释或揭示蛋白质组的复杂性。基于质谱的匹配或同源肽的鉴定可以进一步完善基因模型。此外,基于检测新的翻译起始位点(同源或近同源)、新的转录本异构体、序列变异或基因间或非翻译基因区域中新的(小)开放阅读框,通过分析 RNAseq 或核糖体分析实验的高通量测序数据,也可以鉴定新的蛋白质形式。其他使用蛋白质组学和基因组学技术组合的蛋白质基因组学研究侧重于抗体测序、免疫肽或毒液肽的鉴定。多年来,越来越多的生物信息学工具和数据库可用于帮助简化这些跨组学研究。其中一些解决方案仅有助于蛋白质基因组学研究的特定步骤,例如为质谱碎裂谱匹配构建定制的序列数据库(基于下一代测序输出)。在过去的几年中,也出现了一些综合工具,可以执行完整的蛋白质基因组学分析。其中一些是作为独立的解决方案提供的,而另一些则是在 Galaxy 等基于网络的框架中实现的。在这篇综述中,我们旨在全面概述所有可用于这一日益发展的研究领域的生物信息学解决方案。