Stapleton Mark, Liao Guochun, Brokstein Peter, Hong Ling, Carninci Piero, Shiraki Toshiyuki, Hayashizaki Yoshihide, Champe Mark, Pacleb Joanne, Wan Ken, Yu Charles, Carlson Joe, George Reed, Celniker Susan, Rubin Gerald M
Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
Genome Res. 2002 Aug;12(8):1294-300. doi: 10.1101/gr.269102.
Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5' expressed sequence tags (5' expressed sequence tags [ESTs]from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to ~40% of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remaining genes, we have generated an additional 157,835 5' ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0-22-h embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70% of the predicted genes in Drosophila.
全长非冗余cDNA克隆文库是功能基因组学的关键试剂。获取这些资源的第一步是构建cDNA文库并进行单通道测序,该文库应包含高比例的全长克隆。果蝇基因文库第1版(DGCr1)的首次发布来自六个代表不同组织、发育阶段和培养的S2细胞系的文库。从这些文库中获得了近80,000个随机5'端表达序列标签(5' EST),经聚类后形成了一个由5849个cDNA组成的非冗余集合,约占果蝇中13,474个预测基因的40%。为了获得代表其余基因的cDNA克隆,我们从两个现有文库和三个新文库中又生成了157,835个5' EST。一个新文库来自成年睾丸,这是我们之前未用于基因发现的组织;另外两个新的帽式捕获标准化文库分别来自0 - 22小时胚胎和成年果蝇头部。利用已注释的黑腹果蝇基因组序列,我们通过将EST与基因组比对进行聚类。对与DGCr1中尚未有cDNA克隆代表的基因重叠的聚类进行进一步分析,并选择推定的全长克隆纳入新的果蝇基因文库(DGC)。DGC的第二版(DGCr2)包含另外5061个克隆,使文库扩展到10,910个cDNA,代表了果蝇中超过70%的预测基因。