Nehls M, Pfeifer D, Micklem G, Schmoor C, Boehm T
Department of Medicine I, University of Freiburg, Germany.
Curr Biol. 1994 Nov 1;4(11):983-9. doi: 10.1016/s0960-9822(00)00222-0.
A central issue in genome analysis is the identification and characterization of coding regions. Estimating the coding complexity of vertebrate genomes by measuring the kinetic complexity of mRNA populations and by sequence analysis of cDNAs is limited by the fact that any given source of mRNA represents a very biased sample of all genes. Exon trapping is a method that enables the identification of genes irrespective of their transcriptional status.
Exons were trapped from the entire mouse genome, and the resulting fragments cloned. About 7% of a random sample of exons taken from this library have significant structural homology or sequence similarity to previously sequenced genes. Using cDNAs derived from several stages of mouse development, evidence for expression of about 62% of this sample of exons was found. These data suggest that the great majority of 'exons' in the library are derived from genes. We estimate that the fraction of the genome contained in trapped exons is 2.4%; this corresponds to a sequence complexity of about 72 megabases.
The library of exons trapped from the entire mouse genome probably represents one of the least biased and most comprehensive libraries of mouse coding regions, and should therefore prove very useful for finding genes during genome mapping and sequencing.
基因组分析中的一个核心问题是编码区的识别和特征描述。通过测量mRNA群体的动力学复杂性以及对cDNA进行序列分析来估计脊椎动物基因组的编码复杂性,受到这样一个事实的限制,即任何给定的mRNA来源都只代表了所有基因中一个非常有偏差的样本。外显子捕获是一种能够识别基因而不考虑其转录状态的方法。
从小鼠整个基因组中捕获外显子,并克隆得到的片段。从该文库中随机抽取的外显子样本中,约7%与先前测序的基因具有显著的结构同源性或序列相似性。利用来自小鼠几个发育阶段的cDNA,发现了该外显子样本中约62%有表达的证据。这些数据表明文库中的绝大多数“外显子”都来自基因。我们估计捕获的外显子中所含基因组的比例为2.4%;这相当于约72兆碱基的序列复杂性。
从小鼠整个基因组中捕获的外显子文库可能是小鼠编码区偏差最小、最全面的文库之一,因此在基因组图谱绘制和测序过程中寻找基因时应该会非常有用。