Suppr超能文献

剪接位点的模糊性将环状 RNA 和线性剪接区分开来。

Ambiguous splice sites distinguish circRNA and linear splicing in the human genome.

机构信息

Department of Biochemistry, Stanford University, Stanford, CA, USA.

Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.

出版信息

Bioinformatics. 2019 Apr 15;35(8):1263-1268. doi: 10.1093/bioinformatics/bty785.

Abstract

MOTIVATION

Identification of splice sites is critical to gene annotation and to determine which sequences control circRNA biogenesis. Full-length RNA transcripts could in principle complete annotations of introns and exons in genomes without external ontologies, i.e., ab initio. However, whether it is possible to reconstruct genomic positions where splicing occurs from full-length transcripts, even if sampled in the absence of noise, depends on the genome sequence composition. If it is not, there exist provable limits on the use of RNA-Seq to define splice locations (linear or circular) in the genome.

RESULTS

We provide a formal definition of splice site ambiguity due to the genomic sequence by introducing equivalent junction, which is the set of local genomic positions resulting in the same RNA sequence when joined through RNA splicing. We show that equivalent junctions are prevalent in diverse eukaryotic genomes and occur in 88.64% and 78.64% of annotated human splice sites in linear and circRNA junctions, respectively. The observed fractions of equivalent junctions and the frequency of many individual motifs are statistically significant when compared against the null distribution computed via simulation or closed-form. The frequency of equivalent junctions establishes a fundamental limit on the possibility of ab initio reconstruction of RNA transcripts without appealing to the ontology of "GT-AG" boundaries defining introns. Said differently, completely ab initio is impossible in the vast majority of splice sites in annotated circRNAs and linear transcripts.

AVAILABILITY AND IMPLEMENTATION

Two python scripts generating an equivalent junction sequence per junction are available at: https://github.com/salzmanlab/Equivalent-Junctions.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

剪接位点的鉴定对于基因注释以及确定哪些序列控制 circRNA 的生物发生至关重要。全长 RNA 转录本原则上可以在没有外部本体论(即从头开始)的情况下完成基因组中外显子和内含子的注释。然而,即使在没有噪声的情况下进行采样,从全长转录本中重建发生剪接的基因组位置是否可能,取决于基因组序列组成。如果不可能,则存在可证明的限制,即使用 RNA-Seq 来定义基因组中剪接位置(线性或圆形)。

结果

我们通过引入等效连接来为基因组序列引起的剪接位点歧义提供正式定义,等效连接是通过 RNA 剪接连接导致相同 RNA 序列的局部基因组位置集。我们表明,等效连接在不同的真核生物基因组中很普遍,并且在人类线性和 circRNA 剪接中分别注释的剪接位点的 88.64%和 78.64%中出现。与通过模拟或闭式计算的零分布相比,观察到的等效连接分数和许多单个基序的频率具有统计学意义。等效连接的频率建立了一个基本限制,即在不依赖于定义内含子的“GT-AG”边界本体论的情况下,从头开始重建 RNA 转录本的可能性。换句话说,在注释的 circRNA 和线性转录本中,绝大多数剪接位点完全从头开始是不可能的。

可用性和实施

两个生成每个连接的等效连接序列的 python 脚本可在 https://github.com/salzmanlab/Equivalent-Junctions 获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
Ambiguous splice sites distinguish circRNA and linear splicing in the human genome.
Bioinformatics. 2019 Apr 15;35(8):1263-1268. doi: 10.1093/bioinformatics/bty785.
3
Diverse alternative back-splicing and alternative splicing landscape of circular RNAs.
Genome Res. 2016 Sep;26(9):1277-87. doi: 10.1101/gr.202895.115. Epub 2016 Jun 30.
4
Rolling Circle cDNA Synthesis Uncovers Circular RNA Splice Variants.
Int J Mol Sci. 2019 Aug 16;20(16):3988. doi: 10.3390/ijms20163988.
6
JEDI: circular RNA prediction based on junction encoders and deep interaction among splice sites.
Bioinformatics. 2021 Jul 12;37(Suppl_1):i289-i298. doi: 10.1093/bioinformatics/btab288.
7
Full-length sequence assembly reveals circular RNAs with diverse non-GT/AG splicing signals in rice.
RNA Biol. 2017 Aug 3;14(8):1055-1063. doi: 10.1080/15476286.2016.1245268. Epub 2016 Oct 14.
8
SPLICE-q: a Python tool for genome-wide quantification of splicing efficiency.
BMC Bioinformatics. 2021 Jul 15;22(1):368. doi: 10.1186/s12859-021-04282-6.
9
CircMiner: accurate and rapid detection of circular RNA through splice-aware pseudo-alignment scheme.
Bioinformatics. 2020 Jun 1;36(12):3703-3711. doi: 10.1093/bioinformatics/btaa232.
10

引用本文的文献

本文引用的文献

1
ciRS-7 exonic sequence is embedded in a long non-coding RNA locus.
PLoS Genet. 2017 Dec 13;13(12):e1007114. doi: 10.1371/journal.pgen.1007114. eCollection 2017 Dec.
2
Detecting circular RNAs: bioinformatic and experimental challenges.
Nat Rev Genet. 2016 Oct 14;17(11):679-692. doi: 10.1038/nrg.2016.114.
3
Oxford Nanopore MinION Sequencing and Genome Assembly.
Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279. doi: 10.1016/j.gpb.2016.05.004. Epub 2016 Sep 17.
4
A benchmark for RNA-seq quantification pipelines.
Genome Biol. 2016 Apr 23;17:74. doi: 10.1186/s13059-016-0940-1.
5
Learning the sequence determinants of alternative splicing from millions of random sequences.
Cell. 2015 Oct 22;163(3):698-711. doi: 10.1016/j.cell.2015.09.054.
6
Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes.
Oncogene. 2016 May 12;35(19):2413-27. doi: 10.1038/onc.2015.318. Epub 2015 Aug 24.
7
Alternative splicing detection workflow needs a careful combination of sample prep and bioinformatics analysis.
BMC Bioinformatics. 2015;16 Suppl 9(Suppl 9):S2. doi: 10.1186/1471-2105-16-S9-S2. Epub 2015 Jun 1.
9
circBase: a database for circular RNAs.
RNA. 2014 Nov;20(11):1666-70. doi: 10.1261/rna.043687.113. Epub 2014 Sep 18.
10
Circular RNA is expressed across the eukaryotic tree of life.
PLoS One. 2014 Mar 7;9(6):e90859. doi: 10.1371/journal.pone.0090859. eCollection 2014.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验