Wu XianMing, Hurst Laurence D
Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom.
Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset, United Kingdom
Mol Biol Evol. 2016 Feb;33(2):518-29. doi: 10.1093/molbev/msv251. Epub 2015 Nov 5.
Where in genes do pathogenic mutations tend to occur and does this provide clues as to the possible underlying mechanisms by which single nucleotide polymorphisms (SNPs) cause disease? As splice-disrupting mutations tend to occur predominantly at exon ends, known also to be hot spots of cis-exonic splice control elements, we examine the relationship between the relative density of such exonic cis-motifs and pathogenic SNPs. In particular, we focus on the intragene distribution of exonic splicing enhancers (ESE) and the covariance between them and disease-associated SNPs. In addition to showing that disease-causing genes tend to be genes with a high intron density, consistent with missplicing, five factors established as trends in ESE usage, are considered: relative position in exons, relative position in genes, flanking intron size, splice sites usage, and phase. We find that more than 76% of pathogenic SNPs are within 3-69 bp of exon ends where ESEs generally reside, this being 13% more than expected. Overall from enrichment of pathogenic SNPs at exon ends, we estimate that approximately 20-45% of SNPs affect splicing. Importantly, we find that within genes pathogenic SNPs tend to occur in splicing-relevant regions with low ESE density: they are found to occur preferentially in the terminal half of genes, in exons flanked by short introns and at the ends of phase (0,0) exons with 3' non-"AGgt" splice site. We suggest the concept of the "fragile" exon, one home to pathogenic SNPs owing to its vulnerability to splice disruption owing to low ESE density.
致病突变倾向于在基因的哪些位置发生,这是否能为单核苷酸多态性(SNP)导致疾病的潜在机制提供线索?由于破坏剪接的突变主要发生在外显子末端,而外显子末端也是顺式外显子剪接控制元件的热点区域,我们研究了此类外显子顺式基序的相对密度与致病SNP之间的关系。特别是,我们关注外显子剪接增强子(ESE)在基因内的分布及其与疾病相关SNP之间的协方差。除了表明致病基因往往是内含子密度高的基因,这与剪接错误一致外,我们还考虑了确定为ESE使用趋势的五个因素:在外显子中的相对位置、在基因中的相对位置、侧翼内含子大小、剪接位点使用情况和相位。我们发现,超过76%的致病SNP位于ESE通常所在的外显子末端3 - 69 bp范围内,这比预期多13%。总体而言,从外显子末端致病SNP的富集情况来看,我们估计约20 - 45%的SNP会影响剪接。重要的是,我们发现在基因内,致病SNP倾向于出现在ESE密度低的剪接相关区域:它们优先出现在基因的后半部分、侧翼为短内含子的外显子以及具有3'非“AGgt”剪接位点的(0,0)相位外显子末端。我们提出了“脆弱”外显子的概念,即由于ESE密度低而容易受到剪接破坏,从而成为致病SNP所在的区域。