Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30329, USA.
School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Microb Genom. 2023 Nov;9(11). doi: 10.1099/mgen.0.001117.
Complete reference genomes, including correct feature annotations, are a fundamental aspect of genomic biology. In the case of protozoan species such as , a major human and animal parasite worldwide, accurate genome annotation can deepen our understanding of the evolution of parasitism and pathogenicity by identifying genes underlying key traits and clinically relevant cellular mechanisms, and by extension, the development of improved prevention strategies and treatments. This study used bioinformatics analyses of mRNA libraries to characterize known introns and identify new intron candidates, working towards completion of the assemblage A strain 'WB' genome and further elucidating 's gene expression. By using a set of experimentally validated positive control loci to calibrate our intron detection pipeline, we were able to detect evidence of previously missed candidate splice junctions directly from expressed transcript data. These intron candidates were further studied using NMDS (non-metric multidimensional scaling) clustering to determine shared characteristics and their relative importance such as secondary structure, splicing efficiency and motif conservation, and thus to refine intron models. Results from this study identified 34 new intron candidates, with several potential introns showing evidence that secondary structure of the mRNA molecule might play a more significant role in splicing than previously reported eukaryotic splicing activity mediated by a reduced spliceosome present in .
完整的参考基因组,包括正确的特征注释,是基因组生物学的一个基本方面。在原虫物种(如)的情况下,它是全球主要的人类和动物寄生虫,准确的基因组注释可以通过识别关键特征和临床相关细胞机制的基因,加深我们对寄生和致病性进化的理解,并扩展到改进预防策略和治疗方法的开发。本研究使用生物信息学分析 mRNA 文库来描述已知的内含子并鉴定新的内含子候选物,致力于完成 聚集 A 株“WB”基因组,并进一步阐明 的基因表达。通过使用一组经过实验验证的阳性对照基因座来校准我们的内含子检测管道,我们能够直接从表达转录数据中检测到先前错过的候选剪接接头的证据。这些内含子候选物进一步使用 NMDS(非度量多维缩放)聚类进行研究,以确定共享特征及其相对重要性,如二级结构、剪接效率和基序保守性,从而改进内含子模型。这项研究确定了 34 个新的内含子候选物,其中几个潜在的内含子表明,mRNA 分子的二级结构在剪接中可能比以前报道的由存在于 中的减少的剪接体介导的真核剪接活性发挥更重要的作用。