Nair Shalini, Li Xue, Anderson Timothy J C, Platt Roy N
Disease Intervention and Prevention Program, Texas Biomedical Research Institute, San Antonio, TX, USA.
Genome Biol Evol. 2025 Jul 3;17(7). doi: 10.1093/gbe/evaf127.
Pooled sequencing provides a rapid cost-effective approach to assess genetic variation segregating within populations of organisms. However, such studies are typically limited to single nucleotide variants and small indels (≤50 bp), and have not been used for structural variants (SVs > 50 bp) which impact large portions of most genomes and may significantly impact phenotype. Here, we examined SVs circulating in five laboratory populations of the human parasite Schistosoma mansoni by generating long-read sequences from pools of worms (92-152 per population). We were able to identify and genotype 17,446 SVs, representing 6.5% of the genome despite challenges in identifying low-frequency variants. SVs included deletions (n = 8,525), duplications (n = 131), insertions (n = 8,410), inversions (n = 311), and translocations (n = 69) and were enriched in repeat regions. More than half (59%) of the SVs were shared between ≥4 populations, but 12% were found in only one of the five populations. Within this subset, we identified 168 population-specific SVs that were at-or-near fixation (>95% alternate allele frequency) in one population but missing (<5%) in the other four populations. Five of these variants impact the coding sequence of six genes. We also identified eight SVs with extreme allele frequency differences between populations within quantitative trait loci for biomedically important pathogen phenotypes (drug resistance, larval stage production) identified in prior genetic mapping studies. These results demonstrate that long-read sequence data from pooled individuals is a viable method to quickly catalogue SVs circulating within populations. Furthermore, some of these variants may be responsible for, or linked to, regions experiencing, population-specific directional selection.
混合测序提供了一种快速且经济高效的方法来评估生物群体中分离的遗传变异。然而,此类研究通常仅限于单核苷酸变异和小的插入缺失(≤50 bp),尚未用于影响大多数基因组大部分区域且可能显著影响表型的结构变异(SVs > 50 bp)。在这里,我们通过从虫体池(每个群体92 - 152个)生成长读长序列,研究了人类寄生虫曼氏血吸虫五个实验室群体中循环的SVs。尽管在识别低频变异方面存在挑战,我们仍能够识别并对17446个SVs进行基因分型,这些变异占基因组的6.5%。SVs包括缺失(n = 8525)、重复(n = 131)、插入(n = 8410)、倒位(n = 311)和易位(n = 69),并且在重复区域富集。超过一半(59%)的SVs在≥4个群体中共享,但12%仅在五个群体中的一个群体中发现。在这个子集中,我们鉴定出168个群体特异性SVs,它们在一个群体中处于或接近固定状态(>95%的替代等位基因频率),而在其他四个群体中缺失(<5%)。其中五个变异影响六个基因的编码序列。我们还在先前遗传图谱研究中确定的生物医学重要病原体表型(耐药性、幼虫阶段产生)的数量性状位点内,鉴定出八个群体间等位基因频率差异极大的SVs。这些结果表明,来自混合个体的长读长序列数据是快速编目群体中循环的SVs的可行方法。此外,其中一些变异可能与经历群体特异性定向选择区域有关或与之相关。