Hakim Jill M C, Guarnizo Sneider A Gutierrez, Machaca Edith Málaga, Gilman Robert H, Mugnier Monica R
bioRxiv. 2023 Jul 27:2023.07.27.550875. doi: 10.1101/2023.07.27.550875.
is the causative agent of Chagas disease, which causes 10,000 deaths per year. Despite the high mortality caused by the pathogen, relatively few parasite genomes have been assembled to date; even some commonly used laboratory strains do not have publicly available genome assemblies. This is at least partially due to 's highly complex and highly repetitive genome: while describing the variation in genome content and structure is critical to better understanding biology and the mechanisms that underlie Chagas disease, the complexity of the genome defies investigation using traditional short read sequencing methods. Here, we have generated a high-quality whole genome assembly of the hybrid Tulahuen strain, a commercially available Type VI strain, using long read Nanopore sequencing without short read scaffolding. Using automated tools and manual curation for annotation, we report a genome with 25% repeat regions, 17% variable multigene family members, and 27% transposable elements. Notably, we find that regions with transposable elements are significantly enriched for surface proteins, and that on average surface proteins are closer to transposable elements compared to other coding regions. This finding supports a possible mechanism for diversification of surface proteins in which mobile genetic elements such as transposons facilitate recombination within the gene family. This work demonstrates the feasibility of nanopore sequencing to resolve complex regions of genomes, and with these resolved regions, provides support for a possible mechanism for genomic diversification.
是恰加斯病的病原体,每年导致10000人死亡。尽管该病原体导致的死亡率很高,但迄今为止,组装的寄生虫基因组相对较少;甚至一些常用的实验室菌株也没有公开可用的基因组组装。这至少部分是由于其高度复杂和高度重复的基因组:虽然描述基因组内容和结构的变异对于更好地理解生物学以及恰加斯病的潜在机制至关重要,但基因组的复杂性使得使用传统的短读长测序方法难以进行研究。在这里,我们使用长读长纳米孔测序且无需短读长支架构建,生成了商业化的VI型菌株杂交图拉温菌株的高质量全基因组组装。通过使用自动化工具和人工注释,我们报告了一个具有25%重复区域、17%可变多基因家族成员和27%转座元件的基因组。值得注意的是,我们发现含有转座元件的区域表面蛋白显著富集,并且与其他编码区域相比,表面蛋白平均更靠近转座元件。这一发现支持了表面蛋白多样化的一种可能机制,即转座子等可移动遗传元件促进基因家族内的重组。这项工作证明了纳米孔测序解析基因组复杂区域的可行性,并利用这些解析区域为基因组多样化的一种可能机制提供了支持。