Department of Animal Sciences, Animal Breeding and Genetics Group, University of Goettingen, 37075, Germany
Center for Integrated Breeding Research, University of Goettingen, 37075, Germany.
Genetics. 2019 Aug;212(4):1045-1061. doi: 10.1534/genetics.119.302283. Epub 2019 May 31.
The concept of haplotype blocks has been shown to be useful in genetics. Fields of application range from the detection of regions under positive selection to statistical methods that make use of dimension reduction. We propose a novel approach ("HaploBlocker") for defining and inferring haplotype blocks that focuses on linkage instead of the commonly used population-wide measures of linkage disequilibrium. We define a haplotype block as a sequence of genetic markers that has a predefined minimum frequency in the population, and only haplotypes with a similar sequence of markers are considered to carry that block, effectively screening a dataset for group-wise identity-by-descent. From these haplotype blocks, we construct a haplotype library that represents a large proportion of genetic variability with a limited number of blocks. Our method is implemented in the associated R-package HaploBlocker, and provides flexibility not only to optimize the structure of the obtained haplotype library for subsequent analyses, but also to handle datasets of different marker density and genetic diversity. By using haplotype blocks instead of single nucleotide polymorphisms (SNPs), local epistatic interactions can be naturally modeled, and the reduced number of parameters enables a wide variety of new methods for further genomic analyses such as genomic prediction and the detection of selection signatures. We illustrate our methodology with a dataset comprising 501 doubled haploid lines in a European maize landrace genotyped at 501,124 SNPs. With the suggested approach, we identified 2991 haplotype blocks with an average length of 2685 SNPs that together represent 94% of the dataset.
单体型块的概念已被证明在遗传学中是有用的。其应用领域包括从检测正选择区域到利用降维的统计方法。我们提出了一种新的方法(“HaploBlocker”)来定义和推断单体型块,该方法侧重于连锁而不是通常使用的群体连锁不平衡的度量。我们将单体型块定义为具有预设最小频率的遗传标记序列,并且只有具有相似标记序列的单体型才被认为携带该块,这有效地对数据集进行了基于组的同源性分析。从这些单体型块中,我们构建了一个单体型文库,该文库用有限数量的块代表了很大一部分遗传变异性。我们的方法在相关的 R 包 HaploBlocker 中实现,不仅提供了优化获得的单体型文库结构以进行后续分析的灵活性,还提供了处理不同标记密度和遗传多样性数据集的灵活性。通过使用单体型块而不是单核苷酸多态性(SNP),可以自然地建模局部上位性相互作用,并且减少的参数数量使得可以广泛使用各种新的方法进行进一步的基因组分析,例如基因组预测和选择特征的检测。我们使用包含在欧洲玉米地方品种中 501 个加倍单倍体系的数据集(在 501124 个 SNP 处进行了基因分型)来说明我们的方法。使用所提出的方法,我们确定了 2991 个单体型块,每个块的平均长度为 2685 个 SNP,它们共同代表了数据集的 94%。