Kumar Dhirendra, Dash Debasis
G.N. Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Mathura Road, Delhi, 110025, India.
Adv Exp Med Biol. 2016;926:1-10. doi: 10.1007/978-3-319-42316-6_1.
Proteogenomic strategies aim to refine genome-wide annotations of protein coding features by using actual protein level observations. Most of the currently applied proteogenomic approaches include integrative analysis of multiple types of high-throughput omics data, e.g., genomics, transcriptomics, proteomics, etc. Recent efforts towards creating a human proteome map were primarily targeted to experimentally detect at least one protein product for each gene in the genome and extensively utilized proteogenomic approaches. The 14 year long wait to get a draft human proteome map, after completion of similar efforts to sequence the genome, explains the huge complexity and technical hurdles of such efforts. Further, the integrative analysis of large-scale multi-omics datasets inherent to these studies becomes a major bottleneck to their success. However, recent developments of various analysis tools and pipelines dedicated to proteogenomics reduce both the time and complexity of such analysis. Here, we summarize notable approaches, studies, software developments and their potential applications towards eukaryotic genome annotation and clinical proteogenomics.
蛋白质基因组学策略旨在通过使用实际的蛋白质水平观察结果来完善蛋白质编码特征的全基因组注释。目前应用的大多数蛋白质基因组学方法包括对多种类型的高通量组学数据进行综合分析,例如基因组学、转录组学、蛋白质组学等。近期绘制人类蛋白质组图谱的努力主要目标是通过实验检测基因组中每个基因的至少一种蛋白质产物,并广泛利用了蛋白质基因组学方法。在完成类似的基因组测序工作后,等待人类蛋白质组图谱草图长达14年,这说明了此类工作的巨大复杂性和技术障碍。此外,这些研究中固有的大规模多组学数据集的综合分析成为其成功的主要瓶颈。然而,最近专门用于蛋白质基因组学的各种分析工具和流程的开发减少了此类分析的时间和复杂性。在这里,我们总结了用于真核生物基因组注释和临床蛋白质基因组学的显著方法、研究、软件开发及其潜在应用。