Suppr超能文献

已知的序列特征解释了所有人类基因末端的一半。

Known sequence features explain half of all human gene ends.

作者信息

Shkurin Aleksei, Pour Sara E, Hughes Timothy R

机构信息

Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada.

Terrence Donnelly Centre for Cellular & Biomolecular Research, Toronto, ON M5S 3E1, Canada.

出版信息

NAR Genom Bioinform. 2023 Apr 5;5(2):lqad031. doi: 10.1093/nargab/lqad031. eCollection 2023 Jun.

Abstract

Cleavage and polyadenylation (CPA) sites define eukaryotic gene ends. CPA sites are associated with five key sequence recognition elements: the upstream UGUA, the polyadenylation signal (PAS), and U-rich sequences; the CAUA dinucleotide where cleavage occurs; and GU-rich downstream elements (DSEs). Currently, it is not clear whether these sequences are sufficient to delineate CPA sites. Additionally, numerous other sequences and factors have been described, often in the context of promoting alternative CPA sites and preventing cryptic CPA site usage. Here, we dissect the contributions of individual sequence features to CPA using standard discriminative models. We show that models comprised only of the five primary CPA sequence features give highest probability scores to constitutive CPA sites at the ends of coding genes, relative to the entire pre-mRNA sequence, for 59% of all human genes. U1-hybridizing sequences provide a small boost in performance. The addition of all known RBP RNA binding motifs to the model increases this figure to only 61%, suggesting that additional factors beyond the core CPA machinery have a minimal role in delineating real from cryptic sites. To our knowledge, this high effectiveness of established features to predict human gene ends has not previously been documented.

摘要

切割与聚腺苷酸化(CPA)位点定义了真核基因的末端。CPA位点与五个关键序列识别元件相关:上游的UGUA、聚腺苷酸化信号(PAS)以及富含U的序列;发生切割的CAUA二核苷酸;以及富含GU的下游元件(DSE)。目前尚不清楚这些序列是否足以界定CPA位点。此外,人们还描述了许多其他序列和因子,这些通常是在促进可变CPA位点和防止隐蔽CPA位点使用的背景下进行的。在此,我们使用标准判别模型剖析了各个序列特征对CPA的贡献。我们发现,对于59%的人类基因,仅由五个主要CPA序列特征组成的模型,相对于整个前体mRNA序列,在编码基因末端的组成型CPA位点上给出的概率得分最高。U1杂交序列在性能上有小幅提升。将所有已知的RBP RNA结合基序添加到模型中,这一比例仅提高到61%,这表明除了核心CPA机制之外的其他因素在区分真实位点和隐蔽位点方面作用极小。据我们所知,既定特征在预测人类基因末端方面的这种高效性此前尚未有文献记载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6bd/10072996/1269e3b70136/lqad031fig1.jpg

相似文献

1
Known sequence features explain half of all human gene ends.
NAR Genom Bioinform. 2023 Apr 5;5(2):lqad031. doi: 10.1093/nargab/lqad031. eCollection 2023 Jun.
2
Known sequence features can explain half of all human gene ends.
NAR Genom Bioinform. 2021 Jun 4;3(2):lqab042. doi: 10.1093/nargab/lqab042. eCollection 2021 Jun.
3
Analysis Polyadenylation Signal Usage in .
Animals (Basel). 2022 Jan 13;12(2):194. doi: 10.3390/ani12020194.
5
Implications of polyadenylation in health and disease.
Nucleus. 2014;5(6):508-19. doi: 10.4161/nucl.36360. Epub 2014 Oct 31.
6
Sequence determinants in human polyadenylation site selection.
BMC Genomics. 2003 Feb 25;4(1):7. doi: 10.1186/1471-2164-4-7.
8
Recent molecular insights into canonical pre-mRNA 3'-end processing.
Transcription. 2020 Apr;11(2):83-96. doi: 10.1080/21541264.2020.1777047. Epub 2020 Jun 11.
9
Suboptimal RNA-RNA interaction limits U1 snRNP inhibition of canonical mRNA 3' processing.
RNA Biol. 2019 Oct;16(10):1448-1460. doi: 10.1080/15476286.2019.1636596. Epub 2019 Jul 7.
10
Cleavage and polyadenylation machinery as a novel targetable vulnerability for human cancer.
Cancer Gene Ther. 2024 Jul;31(7):957-960. doi: 10.1038/s41417-024-00770-y. Epub 2024 Apr 17.

本文引用的文献

1
TREND-DB-a transcriptome-wide atlas of the dynamic landscape of alternative polyadenylation.
Nucleic Acids Res. 2021 Jan 8;49(D1):D243-D253. doi: 10.1093/nar/gkaa722.
2
Co-transcriptional Loading of RNA Export Factors Shapes the Human Transcriptome.
Mol Cell. 2019 Jul 25;75(2):310-323.e8. doi: 10.1016/j.molcel.2019.04.034. Epub 2019 May 16.
4
Sequence, Structure, and Context Preferences of Human RNA Binding Proteins.
Mol Cell. 2018 Jun 7;70(5):854-867.e9. doi: 10.1016/j.molcel.2018.05.001.
5
Inference of the human polyadenylation code.
Bioinformatics. 2018 Sep 1;34(17):2889-2898. doi: 10.1093/bioinformatics/bty211.
6
PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes.
Nucleic Acids Res. 2018 Jan 4;46(D1):D315-D319. doi: 10.1093/nar/gkx1000.
7
U1 snRNP telescripting regulates a size-function-stratified human genome.
Nat Struct Mol Biol. 2017 Nov;24(11):993-999. doi: 10.1038/nsmb.3473. Epub 2017 Oct 2.
8
Cleavage and polyadenylation: Ending the message expands gene regulation.
RNA Biol. 2017 Jul 3;14(7):865-890. doi: 10.1080/15476286.2017.1306171. Epub 2017 Apr 28.
9
Distinctive Patterns of Transcription and RNA Processing for Human lincRNAs.
Mol Cell. 2017 Jan 5;65(1):25-38. doi: 10.1016/j.molcel.2016.11.029. Epub 2016 Dec 22.
10
From IPEX syndrome to FOXP3 mutation: a lesson on immune dysregulation.
Ann N Y Acad Sci. 2018 Apr;1417(1):5-22. doi: 10.1111/nyas.13011. Epub 2016 Feb 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验