Stroup Emily Kunce, Sun Tianjiao, Li Qianru, Carinato John, Ji Zhe
Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.
Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, Illinois, USA.
Wiley Interdiscip Rev RNA. 2025 May-Jun;16(3):e70017. doi: 10.1002/wrna.70017.
3'-end cleavage and polyadenylation is an essential step of eukaryotic mRNA and lncRNA expression. The formation of a polyadenylation (polyA) site is determined by combinatory effects of multiple tandem motifs (~6 motifs in humans), each of which is bound by a protein subcomplex. However, motif occurrences and compositions are quite variable across individual polyA sites, leading to the technical challenge of quantifying polyadenylation activities and defining cleavage sites. Although conventional motif enrichment analyses and machine learning models identified contributing polyadenylation motifs, these cannot unbiasedly quantify motif crosstalk. Recently, several groups developed deep learning models to resolve sequence complexity, capture complex positional interactions among cis-regulatory motifs, examine polyA site formation, predict cleavage probability, and calculate site strength. These deep learning models have brought novel insights into polyadenylation biology, such as site configuration differences across species, cleavage heterogeneity, genomic parameters regulating site expression, and human genetic variants altering polyadenylation activities. In this review, we summarize the advances of deep learning models developed to address facets of polyadenylation regulation and discuss applications of the models.
3' 端切割和多聚腺苷酸化是真核生物 mRNA 和 lncRNA 表达的关键步骤。多聚腺苷酸化(polyA)位点的形成由多个串联基序(人类中约有 6 个基序)的组合效应决定,每个基序都由一个蛋白质亚复合物结合。然而,各个 polyA 位点的基序出现情况和组成差异很大,这给多聚腺苷酸化活性的定量和切割位点的定义带来了技术挑战。尽管传统的基序富集分析和机器学习模型确定了对多聚腺苷酸化有贡献的基序,但这些方法无法无偏地量化基序间的相互作用。最近,几个研究团队开发了深度学习模型来解决序列复杂性问题,捕捉顺式调控基序之间复杂的位置相互作用,研究 polyA 位点的形成,预测切割概率,并计算位点强度。这些深度学习模型为多聚腺苷酸化生物学带来了新的见解,例如不同物种间的位点配置差异、切割异质性、调节位点表达的基因组参数以及改变多聚腺苷酸化活性的人类遗传变异。在这篇综述中,我们总结了为解决多聚腺苷酸化调控方面问题而开发的深度学习模型的进展,并讨论了这些模型的应用。