Institute of Virology, Medical Faculty, Heinrich Heine University Düsseldorf, D-40225 Düsseldorf, Germany.
Institute of Clinical Neuroscience and Medical Psychology, Medical Faculty, Heinrich Heine University Düsseldorf, D-40225 Düsseldorf, Germany.
Genome Res. 2018 Dec;28(12):1826-1840. doi: 10.1101/gr.235861.118. Epub 2018 Oct 24.
Most human pathogenic mutations in 5' splice sites affect the canonical GT in positions +1 and +2, leading to noncanonical dinucleotides. On the other hand, noncanonical dinucleotides are observed under physiological conditions in ∼1% of all human 5'ss. It is therefore a challenging task to understand the pathogenic mutation mechanisms underlying the conditions under which noncanonical 5'ss are used. In this work, we systematically examined noncanonical 5' splice site selection, both experimentally using splicing competition reporters and by analyzing a large RNA-seq data set of 54 fibroblast samples from 27 subjects containing a total of 2.4 billion gapped reads covering 269,375 exon junctions. From both approaches, we consistently derived a noncanonical 5'ss usage ranking GC > TT > AT > GA > GG > CT. In our competition splicing reporter assay, noncanonical splicing was strictly dependent on the presence of upstream or downstream splicing regulatory elements (SREs), and changes in SREs could be compensated by variation of U1 snRNA complementarity in the competing 5'ss. In particular, we could confirm splicing at different positions (i.e., -1, +1, +5) of a splice site for all noncanonical dinucleotides "weaker" than GC. In our comprehensive RNA-seq data set analysis, noncanonical 5'ss were preferentially detected in weakly used exon junctions of highly expressed genes. Among high-confidence splice sites, they were 10-fold overrepresented in clusters with a neighboring, more frequently used 5'ss. Conversely, these more frequently used neighbors contained only the dinucleotides GT, GC, and TT, in accordance with the above ranking.
大多数人类致病突变发生在 5' 剪接位点的+1 和+2 位置的典型 GT 上,导致非典型二核苷酸。另一方面,在生理条件下,大约 1%的人类 5'ss 中观察到非典型二核苷酸。因此,理解在使用非典型 5'ss 的条件下致病突变的机制是一项具有挑战性的任务。在这项工作中,我们通过使用剪接竞争报告基因进行实验,以及通过分析来自 27 个个体的 54 个成纤维细胞样本的大型 RNA-seq 数据集(共包含 269,375 个外显子接头的 24 亿个缺口读取),系统地研究了非典型 5' 剪接位点的选择。从这两种方法中,我们一致得出了非典型 5'ss 使用的排名 GC>TT>AT>GA>GG>CT。在我们的竞争剪接报告基因试验中,非典型剪接严格依赖于上游或下游剪接调节元件(SRE)的存在,并且 SRE 的变化可以通过竞争 5'ss 中 U1 snRNA 互补性的变化来补偿。特别是,我们可以确认在所有非典型二核苷酸“弱于”GC 的剪接位点的不同位置(即-1、+1、+5)进行剪接。在我们全面的 RNA-seq 数据集分析中,非典型 5'ss 优先在高度表达基因中使用较弱的外显子接头中检测到。在高可信度剪接位点中,它们在邻近、使用更频繁的 5'ss 的簇中存在 10 倍的过表达。相反,这些使用更频繁的邻居仅包含二核苷酸 GT、GC 和 TT,与上述排名一致。