Emond-Rheault Jean-Guillaume, Ferreira Gabriel Reis, Lavoie-Ouellet Camille, Smith Martin A, Papadopoulou Barbara
Research Center in Infectious Diseases and Axis of Infectious and Immune Diseases, Research Center of the Centre Hospitalier Universitaire de Québec-Université Laval, 2705 Laurier Blvd, Quebec, QC, G1V 4G2, Canada.
Department of Microbiology, Infectious Disease and Immunology, Faculty of Medicine, University Laval, Quebec, QC, G1V 0A6, Canada.
BMC Genomics. 2025 Jul 1;26(1):573. doi: 10.1186/s12864-025-11767-8.
relies on posttranscriptional control to regulate gene expression. Protein-coding genes are synthesised as polycistronic precursors that are processed into individual mRNAs by -splicing adding the spliced leader (SL) RNA to the 5’-end and 3’ cleavage-polyadenylation. Here, we employ Nanopore direct RNA sequencing (DRS) combined with Illumina RNA-Seq to comprehensively interrogate the transcriptomes of developmental stages at single-molecule resolution.
Analysis of DRS full-length reads of poly(A)+-enriched RNA from developmental stages enabled us to precisely determine the primary SL and poly(A) sites for 52% of the protein-coding transcripts and to accurately define their 5’- and 3’-end and the length of UTRs. In addition, our analysis confirmed the motifs ‘[C/A/T] A|G’ being associated with 94.8% of the SL cleavage sites and better defined the genomic context for cleavage and polyadenylation. Overall, we observed more diversity for poly(A) than SL sites per transcript. The frequency of the primary SL and poly(A) sites was 64.2% and 24%, respectively, with most transcripts having additional poly(A) sites nearby. Alternative polyadenylation was detected in 11-13% of transcripts with ~ 20% of these having different primary poly(A) sites between promastigote and amastigote developmental stages. Furthermore, DRS uncovered multiple processing events occurring mostly within 3’UTRs, leading to the formation of long non-coding RNAs (lncRNAs). The transcriptome expresses a rich repertoire of 1,825 lncRNAs, of which 98% were not previously annotated in and only 21.5% were found in . These lncRNAs exhibit generally distinct expression patterns from the 3’UTRs they derived and several are developmentally regulated, representing ~ 27% of the stage-regulated transcriptome. Their expression was generally higher in amastigotes than in promastigotes, highlighting their importance in parasite intracellular development. Protein prediction tools combined to mass-spectrometry revealed that 7.6% of these lncRNAs have a limited protein-coding potential.
This is the first comprehensive transcriptomic analysis of developmental stages using single-molecule Nanopore DRS. Our findings advance knowledge on existing expression datasets and provide new insights into the transcriptome complexity and dynamics of both protein-coding and non-coding sequences throughout the parasite development.
The online version contains supplementary material available at 10.1186/s12864-025-11767-8.
依赖转录后调控来调节基因表达。蛋白质编码基因以多顺反子前体的形式合成,通过剪接将剪接前导序列(SL)RNA添加到5'端,并进行3'切割-聚腺苷酸化,从而加工成单个mRNA。在这里,我们采用纳米孔直接RNA测序(DRS)与Illumina RNA-Seq相结合的方法,以单分子分辨率全面研究发育阶段的转录组。
对来自发育阶段的富含多聚腺苷酸(poly(A)+)的RNA的DRS全长读数进行分析,使我们能够精确确定52%的蛋白质编码转录本的主要SL和poly(A)位点,并准确界定其5'端和3'端以及非翻译区(UTR)的长度。此外,我们的分析证实了“[C/A/T] A|G”基序与94.8%的SL切割位点相关,并更好地界定了切割和聚腺苷酸化的基因组背景。总体而言,我们观察到每个转录本的poly(A)位点比SL位点更多样化。主要SL和poly(A)位点的频率分别为64.2%和24%,大多数转录本在附近还有额外的poly(A)位点。在11%-13%的转录本中检测到可变聚腺苷酸化,其中约20%在前鞭毛体和无鞭毛体发育阶段具有不同的主要poly(A)位点。此外,DRS揭示了大多发生在3'UTR内的多个加工事件,导致了长链非编码RNA(lncRNA)的形成。该转录组表达了丰富的1825种lncRNA,其中98%在之前未被注释,只有21.5%在中被发现。这些lncRNA通常表现出与其衍生的3'UTR不同的表达模式,其中一些在发育过程中受到调控,约占阶段调控转录组的27%。它们在无鞭毛体中的表达通常高于前鞭毛体,突出了它们在寄生虫细胞内发育中的重要性。结合蛋白质预测工具和质谱分析表明,这些lncRNA中有7.6%具有有限的蛋白质编码潜力。
这是首次使用单分子纳米孔DRS对发育阶段进行全面的转录组分析。我们的发现推进了对现有表达数据集的认识,并为整个寄生虫发育过程中蛋白质编码和非编码序列的转录组复杂性和动态性提供了新的见解。
在线版本包含可在10.1186/s12864-025-11767-8获取的补充材料。