Department for Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.
Genome Res. 2010 Jun;20(6):837-46. doi: 10.1101/gr.103119.109. Epub 2010 Mar 17.
Pristionchus pacificus is a nematode model organism whose genome has recently been sequenced. To refine the genome annotation we performed transcriptome and proteome analysis and gathered comprehensive experimental information on gene expression. Transcriptome analysis on a 454 Life Sciences (Roche) FLX platform generated >700,000 expressed sequence tags (ESTs) from two normalized EST libraries, whereas proteome analysis on an LTQ-Orbitrap mass spectrometer detected >27,000 nonredundant peptide sequences from more than 4000 proteins at sub-parts-per-million (ppm) mass accuracy and a false discovery rate of <1%. Retraining of the SNAP gene prediction algorithm using the gene expression data led to a decrease in the number of previously predicted protein-coding genes from 29,000 to 24,000 and refinement of numerous gene models. The P. pacificus proteome contains a high proportion of small proteins with no known homologs in other species ("pioneer" proteins). Some of these proteins appear to be products of highly homologous genes, pointing to their common origin. We show that >50% of all pioneer genes are transcribed under standard culture conditions and that pioneer proteins significantly contribute to a unimodal distribution of predicted protein sizes in P. pacificus, which has an unusually low median size of 240 amino acids (26.8 kDa). In contrast, the predicted proteome of Caenorhabditis elegans follows a distinct bimodal protein size distribution, with significant functional differences between small and large protein populations. Combined, these results provide the first catalog of the expressed genome of P. pacificus, refinement of its genome annotation, and the first comparison of related nematode models at the proteome level.
秀丽隐杆线虫是一种已被测序的线虫模式生物。为了改进其基因组注释,我们进行了转录组和蛋白质组分析,并收集了关于基因表达的综合实验信息。454 Life Sciences(Roche)FLX 平台上的转录组分析从两个标准化的 EST 文库中生成了超过 700000 个表达序列标签(EST),而 LTQ-Orbitrap 质谱仪上的蛋白质组分析则在亚 ppm 质量精度和 <1%的假发现率下从超过 4000 种蛋白质中检测到了 >27000 个非冗余肽序列。使用基因表达数据重新训练 SNAP 基因预测算法导致先前预测的蛋白质编码基因数量从 29000 个减少到 24000 个,并对许多基因模型进行了细化。秀丽隐杆线虫蛋白质组中含有大量没有其他物种同源物的小蛋白质(“先锋”蛋白质)。其中一些蛋白质似乎是高度同源基因的产物,指向它们的共同起源。我们表明,>50%的所有先锋基因在标准培养条件下转录,并且先锋蛋白质对秀丽隐杆线虫中预测蛋白质大小的单峰分布有显著贡献,其中位数大小异常低,为 240 个氨基酸(26.8 kDa)。相比之下,秀丽隐杆线虫的预测蛋白质组遵循明显的双峰蛋白质大小分布,小蛋白和大蛋白群体之间存在显著的功能差异。综合这些结果,提供了秀丽隐杆线虫表达基因组的第一个目录,其基因组注释的改进,以及相关线虫模型在蛋白质组水平上的第一次比较。