Lamesch Philippe, Milstein Stuart, Hao Tong, Rosenberg Jennifer, Li Ning, Sequerra Reynaldo, Bosak Stephanie, Doucette-Stamm Lynn, Vandenhaute Jean, Hill David E, Vidal Marc
Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA.
Genome Res. 2004 Oct;14(10B):2064-9. doi: 10.1101/gr.2496804.
The first version of the Caenorhabditis elegans ORFeome cloning project, based on release WS9 of Wormbase (August 1999), provided experimental verifications for approximately 55% of predicted protein-encoding open reading frames (ORFs). The remaining 45% of predicted ORFs could not be cloned, possibly as a result of mispredicted gene boundaries. Since the release of WS9, gene predictions have improved continuously. To test the accuracy of evolving predictions, we attempted to PCR-amplify from a highly representative worm cDNA library and Gateway-clone approximately 4200 ORFs missed earlier and for which new predictions are available in WS100 (May 2003). In this set we successfully cloned 63% of ORFs with supporting experimental data ("touched" ORFs), and 42% of ORFs with no supporting experimental evidence ("untouched" ORFs). Approximately 2000 full-length ORFs were cloned in-frame, 13% of which were corrected in their exon/intron structure relative to WS100 predictions. In total, approximately 12,500 C. elegans ORFs are now available as Gateway Entry clones for various reverse proteomics (ORFeome v3.1). This work illustrates why the cloning of a complete C. elegans ORFeome, and likely the ORFeomes of other multicellular organisms, needs to be an iterative process that requires multiple rounds of experimental validation together with gradually improving gene predictions.
基于Wormbase的WS9版本(1999年8月),秀丽隐杆线虫开放阅读框(ORF)克隆计划的第一个版本对约55%的预测蛋白质编码开放阅读框进行了实验验证。其余45%的预测开放阅读框无法克隆,可能是基因边界预测错误所致。自WS9发布以来,基因预测不断改进。为了测试不断发展的预测的准确性,我们尝试从一个具有高度代表性的线虫cDNA文库中进行PCR扩增,并通过Gateway克隆约4200个先前遗漏且在WS100(2003年5月)中有新预测的开放阅读框。在这一组中,我们成功克隆了63%有支持性实验数据的开放阅读框(“已涉及”的开放阅读框),以及42%没有支持性实验证据的开放阅读框(“未涉及”的开放阅读框)。约2000个全长开放阅读框被框内克隆,其中13%相对于WS100的预测在其外显子/内含子结构上得到了校正。现在,总共约12500个秀丽隐杆线虫开放阅读框可作为用于各种反向蛋白质组学的Gateway入门克隆(ORFeome v3.1)。这项工作说明了为什么克隆完整的秀丽隐杆线虫开放阅读框以及可能其他多细胞生物的开放阅读框需要是一个迭代过程,需要多轮实验验证以及逐步改进的基因预测。