Bonaldo M F, Lennon G, Soares M B
Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, New York, USA.
Genome Res. 1996 Sep;6(9):791-806. doi: 10.1101/gr.6.9.791.
Large-scale sequencing of cDNAs randomly picked from libraries has proven to be a very powerful approach to discover (putatively) expressed sequences that, in turn, once mapped, may greatly expedite the process involved in the identification and cloning of human disease genes. However, the integrity of the data and the pace at which novel sequences can be identified depends to a great extent on the cDNA libraries that are used. Because altogether, in a typical cell, the mRNAs of the prevalent and intermediate frequency classes comprise as much as 50-65% of the total mRNA mass, but represent no more than 1000-2000 different mRNAs, redundant identification of mRNAs of these two frequency classes is destined to become overwhelming relatively early in any such random gene discovery programs, thus seriously compromising their cost-effectiveness. With the goal of facilitating such efforts, previously we developed a method to construct directionally cloned normalized cDNA libraries and applied it to generate infant brain (INIB) and fetal liver/spleen (INFLS) libraries, from which a total of 45,192 and 86,088 expressed sequence tags, respectively, have been derived. While improving the representation of the longest cDNAs in our libraries, we developed three additional methods to normalize cDNA libraries and generated over 35 libraries, most of which have been contributed to our integrated Molecular Analysis of Genomes and Their Expression (IMAGE) Consortium and thus distributed widely and used for sequencing and mapping. In an attempt to facilitate the process of gene discovery further, we have also developed a subtractive hybridization approach designed specifically to eliminate (or reduce significantly the representation of) large pools of arrayed and (mostly) sequenced clones from normalized libraries yet to be (or just partly) surveyed. Here we present a detailed description and a comparative analysis of four methods that we developed and used to generate normalize cDNA libraries from human (15), mouse (3), rat (2), as well as the parasite Schistosoma mansoni (1). In addition, we describe the construction and preliminary characterization of a subtracted liver/spleen library (INFLS-SI) that resulted from the elimination (or reduction of representation) of -5000 INFLS-IMAGE clones from the INFLS library.
从文库中随机挑选cDNA进行大规模测序,已被证明是发现(推测)表达序列的一种非常有效的方法。这些表达序列一旦被定位,反过来可能会大大加快人类疾病基因鉴定和克隆所涉及的进程。然而,数据的完整性以及新序列的识别速度在很大程度上取决于所使用的cDNA文库。因为在一个典型的细胞中,高丰度和中等丰度类别的mRNA总共占总mRNA量的50 - 65%,但代表的不同mRNA不超过1000 - 2000种,所以在任何此类随机基因发现计划中,这两类丰度mRNA的冗余识别注定会在相对较早的阶段变得势不可挡,从而严重损害其成本效益。为了推动此类工作,我们之前开发了一种构建定向克隆标准化cDNA文库的方法,并将其应用于生成婴儿脑(INIB)和胎儿肝/脾(INFLS)文库,分别从中获得了总共45,192个和86,088个表达序列标签。在提高文库中最长cDNA的代表性的同时,我们又开发了另外三种标准化cDNA文库的方法,并生成了35多个文库,其中大部分已贡献给我们的基因组及其表达综合分子分析(IMAGE)联盟,因此被广泛分发并用于测序和定位。为了进一步推动基因发现的进程,我们还开发了一种消减杂交方法,专门用于从尚未(或仅部分)检测的标准化文库中消除(或显著减少)大量已排列且(大多)已测序的克隆。在这里,我们详细描述并比较分析了我们开发并用于从人(15个)、小鼠(3个)、大鼠(2个)以及寄生虫曼氏血吸虫(1个)生成标准化cDNA文库的四种方法。此外,我们描述了通过从INFLS文库中消除(或减少代表性)约5000个INFLS - IMAGE克隆而产生的消减肝/脾文库(INFLS - SI)的构建和初步表征。