Zhang M Q
Cold Spring Harbor Laboratory, PO Box 100, Cold Spring Harbor, NY 11724, USA.
Hum Mol Genet. 1998 May;7(5):919-32. doi: 10.1093/hmg/7.5.919.
To facilitate gene finding and for the investigation of human molecular genetics on a genome scale, we present a comprehensive survey on various statistical features of human exons. We first show that human exons with flanking genomic DNA sequences can be classified into 12 mutually exclusive categories. This classification could serve as a standard for future studies so that direct comparisons of results can be made. A database for eight categories (related to human genes in which coding regions are split by introns) was built from GenBank release 87.0 and analyzed by a number of methods to characterize statistical features of these sequences that may serve as controls or regulatory signals for gene expression. The statistical information compiled includes profiles of signals for transcription, splicing and translation, various compositional statistics and size distributions. Further analyses reveal novel correlations and constraints among different splicing features across an internal exon that are consistent with the Exon Definition model. This information is fundamental for a quantitative view of human gene organization, and should be invaluable for individual scientists to design human molecular genetics experiments.
为了便于基因发现以及在基因组规模上研究人类分子遗传学,我们对人类外显子的各种统计特征进行了全面调查。我们首先表明,带有侧翼基因组DNA序列的人类外显子可分为12个相互排斥的类别。这种分类可作为未来研究的标准,以便能够直接比较结果。从GenBank 87.0版本构建了一个包含八种类别的数据库(与编码区被内含子分隔的人类基因相关),并通过多种方法进行分析,以表征这些序列的统计特征,这些特征可能作为基因表达的对照或调控信号。汇编的统计信息包括转录、剪接和翻译的信号图谱、各种组成统计和大小分布。进一步的分析揭示了内部外显子不同剪接特征之间新的相关性和约束,这与外显子定义模型一致。这些信息对于定量了解人类基因组织至关重要,对于个体科学家设计人类分子遗传学实验应该具有极高的价值。