Zhang Xiang H-F, Leslie Christina S, Chasin Lawrence A
Department of Biological Sciences, Columbia University, New York, New York 10027, USA.
Genome Res. 2005 Jun;15(6):768-79. doi: 10.1101/gr.3217705.
Intronic elements flanking the splice-site consensus sequences are thought to play a role in pre-mRNA splicing. However, the generality of this role, the catalog of effective sequences, and the mechanisms involved are still lacking. Using molecular genetic tests, we first showed that the approximately 50-nt intronic flanking sequences of exons beyond the splice-site consensus are generally important for splicing. We then went on to characterize exon flank sequences on a genomic scale. The G+C content of flanks displayed a bimodal distribution reflecting an exaggeration of this base composition in flanks relative to the gene as a whole. We divided all exons into two classes according to their flank G+C content and used computational and statistical methods to define pentamers of high relative abundance and phylogenetic conservation in exon flanks. Upstream pentamers were often common to the two classes, whereas downstream pentamers were totally different. Upstream and downstream pentamers were often identical around low G+C exons, and in contrast, were often complementary around high G+C exons. In agreement with this complementarity, predicted base pairing was more frequent between the flanks of high G+C exons. Pseudo exons did not exhibit this behavior, but rather tended to form base pairs between flanks and exon bodies. We conclude that most exons require signals in their immediate flanks for efficient splicing. G+C content is a sequence feature correlated with many genetic and genomic attributes. We speculate that there may be different mechanisms for splice site recognition depending on G+C content.
剪接位点共有序列两侧的内含子元件被认为在mRNA前体剪接中发挥作用。然而,这种作用的普遍性、有效序列的目录以及所涉及的机制仍然缺乏。通过分子遗传学测试,我们首先表明,剪接位点共有序列之外的外显子两侧约50个核苷酸的内含子侧翼序列通常对剪接很重要。然后我们继续在基因组规模上表征外显子侧翼序列。侧翼的G+C含量呈现双峰分布,反映出侧翼相对于整个基因而言这种碱基组成的夸大情况。我们根据侧翼G+C含量将所有外显子分为两类,并使用计算和统计方法来定义外显子侧翼中相对丰度高且系统发育保守的五聚体。上游五聚体在这两类中通常是常见的,而下游五聚体则完全不同。上游和下游五聚体在低G+C外显子周围通常是相同的,相反,在高G+C外显子周围通常是互补的。与这种互补性一致,高G+C外显子侧翼之间预测的碱基配对更频繁。假外显子没有表现出这种行为,而是倾向于在侧翼和外显子主体之间形成碱基对。我们得出结论,大多数外显子需要其紧邻侧翼中的信号来进行有效剪接。G+C含量是与许多遗传和基因组属性相关的序列特征。我们推测,根据G+C含量,可能存在不同的剪接位点识别机制。