Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan.
Taiwan AI Labs & Foundation, Taipei, 10351, Taiwan.
BMC Genomics. 2024 Sep 3;25(Suppl 3):830. doi: 10.1186/s12864-024-10667-7.
Alternative splicing is a pivotal mechanism of post-transcriptional modification that contributes to the transcriptome plasticity and proteome diversity in metazoan cells. Although many splicing regulations around the exon/intron regions are known, the relationship between promoter-bound transcription factors and the downstream alternative splicing largely remains unexplored.
In this study, we present computational approaches to unravel the regulatory relationship between promoter-bound transcription factor binding sites (TFBSs) and the splicing patterns. We curated a fine dataset that includes DNase I hypersensitive site sequencing and transcriptomes across fifteen human tissues from ENCODE. Specifically, we proposed different representations of TF binding context and splicing patterns to examine the associations between the promoter and downstream splicing events. While machine learning models demonstrated potential in predicting splicing patterns based on TFBS occupancies, the limitations in the generalization of predicting the splicing forms of singleton genes across diverse tissues was observed with carefully examination using different cross-validation methods. We further investigated the association between alterations in individual TFBS at promoters and shifts in exon splicing efficiency. Our results demonstrate that the convolutional neural network (CNN) models, trained on TF binding changes in the promoters, can predict the changes in splicing patterns. Furthermore, a systemic in silico substitutions analysis on the CNN models highlighted several potential splicing regulators. Notably, using empirical validation using K562 CTCFL shRNA knock-down data, we showed the significant role of CTCFL in splicing regulation.
In conclusion, our finding highlights the potential role of promoter-bound TFBSs in influencing the regulation of downstream splicing patterns and provides insights for discovering alternative splicing regulations.
可变剪接是一种转录后修饰的关键机制,有助于真核细胞中转录组的可塑性和蛋白质组的多样性。虽然已知许多外显子/内含子区域的剪接调控,但启动子结合转录因子与下游可变剪接之间的关系在很大程度上仍未被探索。
在这项研究中,我们提出了计算方法来揭示启动子结合转录因子结合位点(TFBS)与剪接模式之间的调控关系。我们整理了一个精细的数据集,其中包括来自 ENCODE 的 15 个人体组织的 DNase I 超敏位点测序和转录组。具体来说,我们提出了不同的 TF 结合上下文和剪接模式表示,以研究启动子和下游剪接事件之间的关联。虽然机器学习模型在基于 TFBS 占据预测剪接模式方面表现出了潜力,但在使用不同的交叉验证方法进行仔细检查时,观察到了在跨多种组织预测单基因剪接形式的泛化能力方面的局限性。我们进一步研究了启动子上单个 TFBS 的改变与外显子剪接效率变化之间的关联。我们的结果表明,在启动子上 TF 结合变化上训练的卷积神经网络(CNN)模型可以预测剪接模式的变化。此外,对 CNN 模型的系统计算机模拟取代分析突出了几个潜在的剪接调节剂。值得注意的是,使用 K562 CTCFL shRNA 敲低数据的经验验证,我们表明了 CTCFL 在剪接调控中的重要作用。
总之,我们的发现强调了启动子结合 TFBS 在影响下游剪接模式调控中的潜在作用,并为发现替代剪接调控提供了思路。