Ansong Charles, Purvine Samuel O, Adkins Joshua N, Lipton Mary S, Smith Richard D
Biological Sciences Division, Pacific Northwest National Laboratory, P.O. Box 999/K8-98, Richland, WA 99352, USA.
Brief Funct Genomic Proteomic. 2008 Jan;7(1):50-62. doi: 10.1093/bfgp/eln010. Epub 2008 Mar 10.
While genome sequencing efforts reveal the basic building blocks of life, a genome sequence alone is insufficient for elucidating biological function. Genome annotation--the process of identifying genes and assigning function to each gene in a genome sequence--provides the means to elucidate biological function from sequence. Current state-of-the-art high-throughput genome annotation uses a combination of comparative (sequence similarity data) and non-comparative (ab initio gene prediction algorithms) methods to identify protein-coding genes in genome sequences. Because approaches used to validate the presence of predicted protein-coding genes are typically based on expressed RNA sequences, they cannot independently and unequivocally determine whether a predicted protein-coding gene is translated into a protein. With the ability to directly measure peptides arising from expressed proteins, high-throughput liquid chromatography-tandem mass spectrometry-based proteomics approaches can be used to verify coding regions of a genomic sequence. Here, we highlight several ways in which high-throughput tandem mass spectrometry-based proteomics can improve the quality of genome annotations and suggest that it could be efficiently applied during the gene calling process so that the improvements are propagated through the subsequent functional annotation process.
虽然基因组测序工作揭示了生命的基本组成部分,但仅靠基因组序列不足以阐明生物学功能。基因组注释——识别基因并为基因组序列中的每个基因赋予功能的过程——提供了从序列阐明生物学功能的方法。当前最先进的高通量基因组注释使用比较(序列相似性数据)和非比较(从头基因预测算法)方法的组合来识别基因组序列中的蛋白质编码基因。由于用于验证预测的蛋白质编码基因存在的方法通常基于表达的RNA序列,因此它们不能独立且明确地确定预测的蛋白质编码基因是否被翻译成蛋白质。基于高通量液相色谱 - 串联质谱的蛋白质组学方法能够直接测量由表达的蛋白质产生的肽段,可用于验证基因组序列的编码区域。在这里,我们强调了基于高通量串联质谱的蛋白质组学可以提高基因组注释质量的几种方式,并表明它可以在基因识别过程中有效应用,以便这些改进在随后的功能注释过程中得以延续。