Kirov Ilya, Dudnikov Maxim, Merkulov Pavel, Shingaliev Andrey, Omarov Murad, Kolganova Elizaveta, Sigaeva Alexandra, Karlov Gennady, Soloviev Alexander
Laboratory of Marker-Assisted and Genomic Selection of Plants, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya str. 42, 127550 Moscow, Russia.
Kurchatov Genomics Center of ARRIAB, All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Street, 42, 127550 Moscow, Russia.
Plants (Basel). 2020 Dec 17;9(12):1794. doi: 10.3390/plants9121794.
The intergenic space of plant genomes encodes many functionally important yet unexplored RNAs. The genomic loci encoding these RNAs are often considered "junk", DNA as they are frequently associated with repeat-rich regions of the genome. The latter makes the annotations of these loci and the assembly of the corresponding transcripts using short RNAseq reads particularly challenging. Here, using long-read Nanopore direct RNA sequencing, we aimed to identify these "junk" RNA molecules, including long non-coding RNAs (lncRNAs) and transposon-derived transcripts expressed during early stages (10 days post anthesis) of seed development of triticale (AABBRR, 2 = 6 = 42), an interspecific hybrid between wheat and rye. Altogether, we found 796 lncRNAs and 20 LTR retrotransposon-related transcripts (RTE-RNAs) expressed at this stage, with most of them being previously unannotated and located in the intergenic as well as intronic regions. Sequence analysis of the lncRNAs provide evidence for the frequent exonization of Class I (retrotransposons) and class II (DNA transposons) transposon sequences and suggest direct influence of "junk" DNA on the structure and origin of lncRNAs. We show that the expression patterns of lncRNAs and RTE-related transcripts have high stage specificity. In turn, almost half of the lncRNAs located in Genomes A and B have the highest expression levels at 10-30 days post anthesis in wheat. Detailed analysis of the protein-coding potential of the RTE-RNAs showed that 75% of them carry open reading frames (ORFs) for a diverse set of GAG proteins, the main component of virus-like particles of LTR retrotransposons. We further experimentally demonstrated that some RTE-RNAs originate from autonomous LTR retrotransposons with ongoing transposition activity during early stages of triticale seed development. Overall, our results provide a framework for further exploration of the newly discovered lncRNAs and RTE-RNAs in functional and genome-wide association studies in triticale and wheat. Our study also demonstrates that Nanopore direct RNA sequencing is an indispensable tool for the elucidation of lncRNA and retrotransposon transcripts.
植物基因组的基因间隔区编码了许多功能重要但尚未被探索的RNA。编码这些RNA的基因组位点通常被视为“垃圾”DNA,因为它们经常与基因组中富含重复序列的区域相关联。后者使得利用短RNA测序读数对这些位点进行注释以及组装相应的转录本极具挑战性。在此,我们使用长读长纳米孔直接RNA测序,旨在鉴定这些“垃圾”RNA分子,包括在小黑麦(AABBRR,2n = 6x = 42)种子发育早期(开花后10天)表达的长链非编码RNA(lncRNA)和转座子衍生转录本。小黑麦是小麦和黑麦的种间杂交种。我们总共发现了796种lncRNA和20种与LTR反转录转座子相关的转录本(RTE - RNA)在这个阶段表达,其中大多数以前未被注释,并且位于基因间隔区以及内含子区域。lncRNA的序列分析为I类(反转录转座子)和II类(DNA转座子)转座子序列的频繁外显子化提供了证据,并表明“垃圾”DNA对lncRNA的结构和起源有直接影响。我们表明lncRNA和RTE相关转录本的表达模式具有高度的阶段特异性。反过来,位于A和B基因组中的lncRNA几乎有一半在小麦开花后10 - 30天表达水平最高。对RTE - RNA的蛋白质编码潜力的详细分析表明,其中75%携带多种GAG蛋白的开放阅读框(ORF),GAG蛋白是LTR反转录转座子类病毒颗粒的主要成分。我们进一步通过实验证明,一些RTE - RNA起源于在小黑麦种子发育早期具有持续转座活性的自主LTR反转录转座子。总体而言,我们的结果为在小黑麦和小麦的功能及全基因组关联研究中进一步探索新发现的lncRNA和RTE - RNA提供了框架。我们的研究还表明,纳米孔直接RNA测序是阐明lncRNA和反转录转座子转录本不可或缺的工具。