Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec), Bielefeld University, 33615 Bielefeld, Germany.
Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615 Bielefeld, Germany.
Cells. 2020 Feb 18;9(2):458. doi: 10.3390/cells9020458.
Most protein-encoding genes in eukaryotes contain introns, which are interwoven with exons. Introns need to be removed from initial transcripts in order to generate the final messenger RNA (mRNA), which can be translated into an amino acid sequence. Precise excision of introns by the spliceosome requires conserved dinucleotides, which mark the splice sites. However, there are variations of the highly conserved combination of GT at the 5' end and AG at the 3' end of an intron in the genome. GC-AG and AT-AC are two major non-canonical splice site combinations, which have been known for years. Recently, various minor non-canonical splice site combinations were detected with numerous dinucleotide permutations. Here, we expand systematic investigations of non-canonical splice site combinations in plants across eukaryotes by analyzing fungal and animal genome sequences. Comparisons of splice site combinations between these three kingdoms revealed several differences, such as an apparently increased CT-AC frequency in fungal genome sequences. Canonical GT-AG splice site combinations in antisense transcripts are a likely explanation for this observation, thus indicating annotation errors. In addition, high numbers of GA-AG splice site combinations were observed in and . A variant in one U1 small nuclear RNA (snRNA) isoform might allow the recognition of GA as a 5' splice site. In depth investigation of splice site usage based on RNA-Seq read mappings indicates a generally higher flexibility of the 3' splice site compared to the 5' splice site across animals, fungi, and plants.
真核生物中的大多数蛋白质编码基因都含有内含子,它们与外显子交织在一起。为了生成最终的信使 RNA(mRNA),可以将其翻译成氨基酸序列,必须从初始转录本中去除内含子。剪接体通过保守的二核苷酸精确切除内含子,这些二核苷酸标记了剪接位点。然而,在基因组中,内含子 5' 端的 GT 和 3' 端的 AG 的高度保守组合存在变体。GC-AG 和 AT-AC 是两种主要的非规范剪接位点组合,多年来一直为人所知。最近,通过对大量二核苷酸排列的检测,发现了各种较小的非规范剪接位点组合。在这里,我们通过分析真菌和动物基因组序列,扩展了对真核生物中非规范剪接位点组合的系统研究。这三个王国之间的剪接位点组合比较揭示了一些差异,例如真菌基因组序列中 CT-AC 的频率明显增加。这种观察结果的一个可能解释是反义转录本中存在规范的 GT-AG 剪接位点组合,这表明注释存在错误。此外,在 和 中观察到大量的 GA-AG 剪接位点组合。一种 U1 小核 RNA(snRNA)同工型的变体可能允许 GA 被识别为 5' 剪接位点。基于 RNA-Seq 读映射的剪接位点使用情况的深入调查表明,与真菌和植物相比,动物中 3' 剪接位点的灵活性普遍高于 5' 剪接位点。