Fedorov A, Saxonov S, Fedorova L, Daizadeh I
Department of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA.
Nucleic Acids Res. 2001 Apr 1;29(7):1464-9. doi: 10.1093/nar/29.7.1464.
Of the rules used by the splicing machinery to precisely determine intron-exon boundaries only a fraction is known. Recent evidence suggests that specific short sequences within exons help in defining these boundaries. Such sequences are known as exonic splicing enhancers (ESE). A possible bioinformatical approach to studying ESE sequences is to compare genes that harbor introns with genes that do not. For this purpose two non-redundant samples of 719 intron-containing and 63 intron-lacking human genes were created. We performed a statistical analysis on these datasets of intron-containing and intron-lacking human coding sequences and found a statistically significant difference (P = 0.01) between these samples in terms of 5-6mer oligonucleotide distributions. The difference is not created by a few strong signals present in the majority of exons, but rather by the accumulation of multiple weak signals through small variations in codon frequencies, codon biases and context-dependent codon biases between the samples. A list of putative novel human splicing regulation sequences has been elucidated by our analysis.
剪接机制用于精确确定内含子-外显子边界的规则,目前仅了解其中一部分。最近的证据表明,外显子内的特定短序列有助于界定这些边界。此类序列被称为外显子剪接增强子(ESE)。一种研究ESE序列的可能的生物信息学方法是,将含有内含子的基因与不含内含子的基因进行比较。为此,创建了两个非冗余样本,分别包含719个含内含子的人类基因和63个不含内含子的人类基因。我们对这些含内含子和不含内含子的人类编码序列数据集进行了统计分析,发现这些样本在5-6聚体寡核苷酸分布方面存在统计学上的显著差异(P = 0.01)。这种差异并非由大多数外显子中存在的少数强信号造成,而是由样本之间密码子频率、密码子偏好和上下文依赖密码子偏好的微小变化所导致的多个弱信号的积累造成。通过我们的分析,已经阐明了一份推定的新型人类剪接调控序列清单。