Lighten Jackie, van Oosterhout Cock, Bentzen Paul
Department of Biology, Marine Gene Probe Laboratory, Dalhousie University, Halifax, Nova Scotia, Canada.
Mol Ecol. 2014 Aug;23(16):3957-72. doi: 10.1111/mec.12843. Epub 2014 Jul 21.
The genotyping of highly polymorphic multigene families across many individuals used to be a particularly challenging task because of methodological limitations associated with traditional approaches. Next-generation sequencing (NGS) can overcome most of these limitations, and it is increasingly being applied in population genetic studies of multigene families. Here, we critically review NGS bioinformatic approaches that have been used to genotype the major histocompatibility complex (MHC) immune genes, and we discuss how the significant advances made in this field are applicable to population genetic studies of gene families. Increasingly, approaches are introduced that apply thresholds of sequencing depth and sequence similarity to separate alleles from methodological artefacts. We explain why these approaches are particularly sensitive to methodological biases by violating fundamental genotyping assumptions. An alternative strategy that utilizes ultra-deep sequencing (hundreds to thousands of sequences per amplicon) to reconstruct genotypes and applies statistical methods on the sequencing depth to separate alleles from artefacts appears to be more robust. Importantly, the 'degree of change' (DOC) method avoids using arbitrary cut-off thresholds by looking for statistical boundaries between the sequencing depth for alleles and artefacts, and hence, it is entirely repeatable across studies. Although the advances made in generating NGS data are still far ahead of our ability to perform reliable processing, analysis and interpretation, the community is developing statistically rigorous protocols that will allow us to address novel questions in evolution, ecology and genetics of multigene families. Future developments in third-generation single molecule sequencing may potentially help overcome problems that still persist in de novo multigene amplicon genotyping when using current second-generation sequencing approaches.
由于传统方法存在方法学上的局限性,对众多个体的高度多态性多基因家族进行基因分型曾经是一项特别具有挑战性的任务。下一代测序(NGS)可以克服这些局限性中的大部分,并且越来越多地应用于多基因家族的群体遗传学研究。在这里,我们批判性地回顾了用于对主要组织相容性复合体(MHC)免疫基因进行基因分型的NGS生物信息学方法,并讨论了该领域取得的重大进展如何应用于基因家族的群体遗传学研究。越来越多地引入了应用测序深度和序列相似性阈值来将等位基因与方法学假象区分开的方法。我们解释了为什么这些方法通过违反基本的基因分型假设而对方法学偏差特别敏感。一种利用超深度测序(每个扩增子数百到数千个序列)来重建基因型并应用测序深度的统计方法来将等位基因与假象区分开的替代策略似乎更稳健。重要的是,“变化程度”(DOC)方法通过寻找等位基因和假象的测序深度之间的统计边界来避免使用任意的截止阈值,因此,它在各项研究中是完全可重复的。尽管在生成NGS数据方面取得的进展仍远远领先于我们进行可靠处理、分析和解释的能力,但该领域正在开发统计上严格的方案,这将使我们能够解决多基因家族在进化、生态和遗传学方面的新问题。第三代单分子测序的未来发展可能有助于克服在使用当前第二代测序方法进行从头多基因扩增子基因分型时仍然存在的问题。