Department of Ocean Sciences, University of California, Santa Cruz, CA, USA.
Southwest Fisheries Science Center, National Marine Fisheries Service, Santa Cruz, CA, USA.
Mol Ecol Resour. 2018 Mar;18(2):296-305. doi: 10.1111/1755-0998.12737. Epub 2017 Dec 15.
The accelerating rate at which DNA sequence data are now generated by high-throughput sequencing instruments provides both opportunities and challenges for population genetic and ecological investigations of animals and plants. We show here how the common practice of calling genotypes from a single SNP per sequenced region ignores substantial additional information in the phased short-read sequences that are provided by these sequencing instruments. We target sequenced regions with multiple SNPs in kelp rockfish (Sebastes atrovirens) to determine "microhaplotypes" and then call these microhaplotypes as alleles at each locus. We then demonstrate how these multi-allelic marker data from such loci dramatically increase power for relationship inference. The microhaplotype approach decreases false-positive rates by several orders of magnitude, relative to calling bi-allelic SNPs, for two challenging analytical procedures, full-sibling and single parent-offspring pair identification. We also show how the identification of half-sibling pairs requires so much data that physical linkage becomes a consideration, and that most published studies that attempt to do so are dramatically underpowered. The advent of phased short-read DNA sequence data, in conjunction with emerging analytical tools for their analysis, promises to improve efficiency by reducing the number of loci necessary for a particular level of statistical confidence, thereby lowering the cost of data collection and reducing the degree of physical linkage amongst markers used for relationship estimation. Such advances will facilitate collaborative research and management for migratory and other widespread species.
高通量测序仪器现在生成 DNA 序列数据的速度正在加快,这为动植物的群体遗传和生态研究提供了机遇和挑战。我们在这里展示了如何从每个测序区域的单个 SNP 调用基因型,而忽略了这些测序仪器提供的相分短读序列中的大量额外信息。我们以黑皮石斑鱼(Sebastes atrovirens)的多个 SNP 测序区域为目标,确定“微单倍型”,然后将这些微单倍型作为每个位点的等位基因进行调用。然后,我们展示了如何利用这些来自此类位点的多等位基因标记数据极大地提高关系推断的能力。与调用双等位基因 SNP 相比,微单倍型方法可将两种具有挑战性的分析程序(全同胞和单亲-后代对识别)的假阳性率降低几个数量级。我们还展示了如何识别半同胞对需要如此多的数据,以至于物理连锁成为一个考虑因素,并且大多数尝试这样做的已发表研究都严重缺乏动力。相分短读 DNA 序列数据的出现,结合用于分析它们的新兴分析工具,有望通过减少特定统计置信度所需的基因座数量来提高效率,从而降低数据收集成本,并降低用于关系估计的标记之间的物理连锁程度。这些进展将促进迁徙和其他广泛分布物种的合作研究和管理。