Institute of Evolutionary Ecology and Conservation Genomics, Ulm Universität, Ulm, Germany.
Zoological Institute, Animal Ecology and Conservation, Biocenter Grindel, Universität Hamburg, Hamburg,, Germany.
Mol Ecol Resour. 2021 Apr;21(3):982-998. doi: 10.1111/1755-0998.13290. Epub 2020 Nov 21.
Genotyping complex multigene families in novel systems is particularly challenging. Target primers frequently amplify simultaneously multiple loci leading to high PCR and sequencing artefacts such as chimeras and allele amplification bias. Most genotyping pipelines have been validated in nonmodel systems whereby the real genotype is unknown and the generation of artefacts may be highly repeatable. Further hindering accurate genotyping, the relationship between artefacts and genotype complexity (i.e. number of alleles per genotype) within a PCR remains poorly described. Here, we investigated the latter by experimentally combining multiple known major histocompatibility complex (MHC) haplotypes of a model organism (chicken, Gallus gallus, 43 artificial genotypes with 2-13 alleles per amplicon). In addition to well-defined 'optimal' primers, we simulated a nonmodel species situation by designing 'cross-species' primers based on sequence data from closely related Galliform species. We applied a novel open-source genotyping pipeline (ACACIA; https://gitlab.com/psc_santos/ACACIA), and compared its performance with another, previously published pipeline (AmpliSAS). Allele calling accuracy was higher when using ACACIA (98.5% versus 97% and 77.8% versus 75% for the 'optimal' and 'cross-species' data sets, respectively). Systematic allele dropout of three alleles owing to primer mismatch in the 'cross-species' data set explained high allele calling repeatability (100% when using ACACIA) despite low accuracy, demonstrating that repeatability can be misleading when evaluating genotyping workflows. Genotype complexity was positively associated with nonchimeric artefacts, chimeric artefacts (nonlinearly by levelling when amplifying more than 4-6 alleles) and allele amplification bias. Our study exemplifies and demonstrates pitfalls researchers should avoid to reliably genotype complex multigene families.
在新系统中对复杂的多基因家族进行基因分型特别具有挑战性。目标引物经常同时扩增多个基因座,导致高 PCR 和测序伪影,如嵌合体和等位基因扩增偏倚。大多数基因分型管道已在非模型系统中得到验证,在这些系统中,真实基因型未知,并且伪影的产生可能具有高度可重复性。进一步阻碍准确基因分型的是,PCR 中伪影与基因型复杂性(即每个基因型的等位基因数)之间的关系描述得很差。在这里,我们通过实验将模型生物(鸡,Gallus gallus,每个扩增子有 2-13 个等位基因的 43 个人工基因型)的多个已知主要组织相容性复合体(MHC)单倍型组合在一起,从而研究了后者。除了明确的“最佳”引物外,我们还根据亲缘关系密切的 Galliform 物种的序列数据设计了“跨物种”引物,模拟了非模型物种的情况。我们应用了一种新的开源基因分型管道(ACACIA;https://gitlab.com/psc_santos/ACACIA),并将其性能与另一个以前发表的管道(AmpliSAS)进行了比较。使用 ACACIA 时,等位基因调用准确性更高(对于“最佳”和“跨物种”数据集,分别为 98.5%和 97%以及 77.8%和 75%)。在“跨物种”数据集由于引物不匹配而导致三个等位基因系统地缺失解释了高等位基因调用重复性(使用 ACACIA 时为 100%),尽管准确性低,但表明在评估基因分型工作流程时,重复性可能具有误导性。基因型复杂性与非嵌合伪影、嵌合伪影(当扩增超过 4-6 个等位基因时非线性地趋平)和等位基因扩增偏倚呈正相关。我们的研究例证并展示了研究人员为了可靠地对复杂的多基因家族进行基因分型而应避免的陷阱。