York Biomedical Research Institute, Department of Biology and York Biomedical Research Institute, University of York, York, YO10 5DD, UK.
BMC Genomics. 2024 Oct 29;25(1):1011. doi: 10.1186/s12864-024-10862-6.
BACKGROUND: Trypanosomatid parasites are a group of protozoans that cause devastating diseases that disproportionately affect developing countries. These protozoans have developed several mechanisms for adaptation to survive in the mammalian host, such as extensive expansion of multigene families enrolled in host-parasite interaction, adaptation to invade and modulate host cells, and the presence of aneuploidy and polyploidy. Two mechanisms might result in "complex" isolates, with more than two haplotypes being present in a single sample: multiplicity of infections (MOI) and polyploidy. We have developed and validated a methodology to identify multiclonal infections and polyploidy using whole genome sequencing reads, based on fluctuations in allelic read depth in heterozygous positions, which can be easily implemented in experiments sequencing genomes from one sample to larger population surveys. RESULTS: The methodology estimates the complexity index (CI) of an isolate, and compares real samples with simulated clonal infections at individual and populational level, excluding regions with somy and gene copy number variation. It was primarily validated with simulated MOI and known polyploid isolates respectively from Leishmania and Trypanosoma cruzi. Then, the approach was used to assess the complexity of infection using genome wide SNP data from 497 trypanosomatid samples from four clades, L. donovani/L. infantum, L. braziliensis, T. cruzi and T. brucei providing an overview of multiclonal infection and polyploidy in these cultured parasites. We show that our method robustly detects complex infections in samples with at least 25x coverage, 100 heterozygous SNPs and where 5-10% of the reads correspond to the secondary clone. We find that relatively small proportions (≤ 7%) of cultured trypanosomatid isolates are complex. CONCLUSIONS: The method can accurately identify polyploid isolates, and can identify multiclonal infections in scenarios with sufficient genome read coverage. We pack our method in a single R script that requires only a standard variant call format (VCF) file to run ( https://github.com/jaumlrc/Complex-Infections ). Our analyses indicate that multiclonality and polyploidy do occur in all clades, but not very frequently in cultured trypanosomatids. We caution that our estimates are lower bounds due to the limitations of current laboratory and bioinformatic methods.
背景:锥虫类寄生虫是一组原生动物,它们会导致严重的疾病,这些疾病在发展中国家的发病率不成比例。这些原生动物已经发展出几种适应机制,以在哺乳动物宿主中生存,例如广泛扩展参与宿主-寄生虫相互作用的多基因家族、适应入侵和调节宿主细胞,以及存在非整倍体和多倍体。两种机制可能导致“复杂”分离株,即在单个样本中存在两种以上的单倍型:多重感染 (MOI) 和多倍体。我们已经开发并验证了一种使用全基因组测序读数识别多克隆感染和多倍体的方法,该方法基于杂合位置等位基因读数深度的波动,该方法可以很容易地在从单个样本到更大种群调查的基因组测序实验中实施。
结果:该方法估计分离株的复杂度指数 (CI),并在个体和群体水平上比较真实样本与模拟的单克隆感染,排除体细胞和基因拷贝数变异的区域。它最初是分别用模拟的 MOI 和已知的多倍体利什曼原虫和克氏锥虫分离株进行验证的。然后,该方法用于使用来自四个类群的 497 个锥虫类样本的全基因组 SNP 数据评估感染的复杂性,这些类群包括利什曼原虫/婴儿利什曼原虫、巴西利什曼原虫、克氏锥虫和布氏锥虫,提供了这些培养寄生虫中多克隆感染和多倍体的概述。我们表明,我们的方法可以在至少 25x 覆盖度、100 个杂合 SNP 和 5-10%的读数对应于次要克隆的情况下,稳健地检测样本中的复杂感染。我们发现,相对较小的比例(≤7%)的培养锥虫类分离株是复杂的。
结论:该方法可以准确识别多倍体分离株,并可以在具有足够基因组读取覆盖度的情况下识别多克隆感染。我们将方法打包在一个单个的 R 脚本中,该脚本仅需要一个标准的变异调用格式 (VCF) 文件即可运行(https://github.com/jaumlrc/Complex-Infections)。我们的分析表明,多克隆性和多倍体确实存在于所有类群中,但在培养的锥虫类中并不常见。我们警告说,由于当前实验室和生物信息学方法的限制,我们的估计值是下限。
BMC Genomics. 2024-10-29
Adv Exp Med Biol. 2008
J Biomed Biotechnol. 2010
Microbiol Spectr. 2023-9-5
Emerg Infect Dis. 2023-5
Front Cell Infect Microbiol. 2023
Curr Opin Microbiol. 2022-12