Institute of Virology, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf 40225, Germany.
Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, Recklinghausen 45665, Germany.
Nucleic Acids Res. 2022 Aug 26;50(15):8834-8851. doi: 10.1093/nar/gkac663.
Correct pre-mRNA processing in higher eukaryotes vastly depends on splice site recognition. Beyond conserved 5'ss and 3'ss motifs, splicing regulatory elements (SREs) play a pivotal role in this recognition process. Here, we present in silico designed sequences with arbitrary a priori prescribed splicing regulatory HEXplorer properties that can be concatenated to arbitrary length without changing their regulatory properties. We experimentally validated in silico predictions in a massively parallel splicing reporter assay on more than 3000 sequences and exemplarily identified some SRE binding proteins. Aiming at a unified 'functional splice site strength' encompassing both U1 snRNA complementarity and impact from neighboring SREs, we developed a novel RNA-seq based 5'ss usage landscape, mapping the competition of pairs of high confidence 5'ss and neighboring exonic GT sites along HBond and HEXplorer score coordinate axes on human fibroblast and endothelium transcriptome datasets. These RNA-seq data served as basis for a logistic 5'ss usage prediction model, which greatly improved discrimination between strong but unused exonic GT sites and annotated highly used 5'ss. Our 5'ss usage landscape offers a unified view on 5'ss and SRE neighborhood impact on splice site recognition, and may contribute to improved mutation assessment in human genetics.
高等真核生物正确的前体 mRNA 加工在很大程度上依赖于剪接位点识别。除了保守的 5'ss 和 3'ss 基序外,剪接调控元件(SREs)在这个识别过程中起着关键作用。在这里,我们提出了具有任意先验规定剪接调控 HEXplorer 特性的计算机设计序列,可以将其任意长度地串联在一起而不改变其调控特性。我们在超过 3000 个序列的大规模平行剪接报告基因实验中验证了计算机预测,并举例鉴定了一些 SRE 结合蛋白。为了实现一个统一的“功能性剪接位点强度”,包括 U1 snRNA 互补性和来自相邻 SRE 的影响,我们开发了一种新的基于 RNA-seq 的 5'ss 使用景观,沿着 HBond 和 HEXplorer 分数坐标轴绘制了高置信 5'ss 对和相邻外显子 GT 位点的配对竞争,在人类成纤维细胞和内皮细胞转录组数据集上。这些 RNA-seq 数据为逻辑 5'ss 使用预测模型提供了基础,该模型大大提高了强但未使用的外显子 GT 位点和注释的高度使用的 5'ss 之间的区分能力。我们的 5'ss 使用景观提供了一个统一的视角来观察 5'ss 和 SRE 对剪接位点识别的影响,并可能有助于改善人类遗传学中的突变评估。