Institute of Bioinformatics, International Technology Park, Bangalore, Karnataka, India.
Proteomics. 2011 Feb;11(4):620-30. doi: 10.1002/pmic.201000615. Epub 2011 Jan 18.
The ability to sequence DNA rapidly, inexpensively and in a high-throughput fashion provides a unique opportunity to sequence whole genomes of a large number of species. The cataloging of protein-coding genes from these species, however, remains a non-trivial task with the majority of initial genome annotation dependent on the use of gene prediction algorithms. Recent advances in mass spectrometry-based proteomics now enable generation of accurate and comprehensive protein sequence of tissues and organisms. Proteogenomics allows us to harness the wealth of information available at the proteome level and apply it to the available genomic information of organisms. This includes identifying novel genes and splice isoforms, assigning correct start sites and validating predicted exons and genes. It is also possible to use proteogenomics to identify protein variants that could cause diseases, to identify protein biomarkers and to study genome variation. We anticipate proteogenomics to become a powerful approach that will be routinely employed by 'Genome and Proteome Centers' of the future.
快速、廉价、高通量地测序 DNA 的能力为对大量物种的全基因组进行测序提供了独特的机会。然而,对这些物种的编码蛋白基因进行编目仍然是一项艰巨的任务,大多数初始基因组注释都依赖于基因预测算法的使用。基于质谱的蛋白质组学的最新进展现在能够生成组织和生物体的准确和全面的蛋白质序列。蛋白质基因组学使我们能够利用蛋白质组水平上可用的丰富信息,并将其应用于生物体的现有基因组信息。这包括鉴定新的基因和剪接异构体,分配正确的起始位点,并验证预测的外显子和基因。也可以使用蛋白质基因组学来鉴定可能导致疾病的蛋白质变体,鉴定蛋白质生物标志物并研究基因组变异。我们预计蛋白质基因组学将成为未来“基因组和蛋白质组中心”常规使用的强大方法。