Merrihew Gennifer E, Davis Colleen, Ewing Brent, Williams Gary, Käll Lukas, Frewen Barbara E, Noble William Stafford, Green Phil, Thomas James H, MacCoss Michael J
University of Washington, Department of Genome Sciences, Seattle, Washington 98195, USA.
Genome Res. 2008 Oct;18(10):1660-9. doi: 10.1101/gr.077644.108. Epub 2008 Jul 24.
We describe a general mass spectrometry-based approach for gene annotation of any organism and demonstrate its effectiveness using the nematode Caenorhabditis elegans. We detected 6779 C. elegans proteins (67,047 peptides), including 384 that, although annotated in WormBase WS150, lacked cDNA or other prior experimental support. We also identified 429 new coding sequences that were unannotated in WS150. Nearly half (192/429) of the new coding sequences were confirmed with RT-PCR data. Thirty-three (approximately 8%) of the new coding sequences had been predicted to be pseudogenes, 151 (approximately 35%) reveal apparent errors in gene models, and 245 (57%) appear to be novel genes. In addition, we verified 6010 exon-exon splice junctions within existing WormBase gene models. Our work confirms that mass spectrometry is a powerful experimental tool for annotating sequenced genomes. In addition, the collection of identified peptides should facilitate future proteomics experiments targeted at specific proteins of interest.
我们描述了一种基于质谱的通用方法,用于对任何生物体进行基因注释,并以线虫秀丽隐杆线虫为例展示了其有效性。我们检测到了6779种秀丽隐杆线虫蛋白质(67047个肽段),其中包括384种,尽管它们在WormBase WS150中已有注释,但缺乏cDNA或其他先前的实验支持。我们还鉴定出了429个在WS150中未注释的新编码序列。新编码序列中近一半(192/429)通过RT-PCR数据得到了证实。新编码序列中有33个(约8%)曾被预测为假基因,151个(约35%)显示出基因模型存在明显错误,245个(57%)似乎是新基因。此外,我们在现有的WormBase基因模型中验证了6010个外显子-外显子剪接位点。我们的工作证实了质谱是注释已测序基因组的强大实验工具。此外,所鉴定肽段的集合应有助于未来针对特定感兴趣蛋白质的蛋白质组学实验。