School of Environmental and Rural Science, University of New England, Armidale, Australia.
Genet Sel Evol. 2014 Feb 4;46(1):11. doi: 10.1186/1297-9686-46-11.
Identifying recombination events and the chromosomal segments that constitute a gamete is useful for a number of applications in genomic analyses. In livestock, genotypic data are commonly available for half-sib families. We propose a straightforward but computationally efficient method to use single nucleotide polymorphism marker genotypes on half-sibs to reconstruct the recombination and segregation events that occurred during meiosis in a sire to form the haplotypes observed in its offspring. These meiosis events determine a block structure in paternal haplotypes of the progeny and this can be used to phase the genotypes of individuals in single half-sib families, to impute haplotypes of the sire if they are not genotyped or to impute the paternal strand of the offspring's sequence based on sequence data of the sire.
The hsphase algorithm exploits information from opposing homozygotes among half-sibs to identify recombination events, and the chromosomal regions from the paternal and maternal strands of the sire (blocks) that were inherited by its progeny. This information is then used to impute the sire's genotype, which, in turn, is used to phase the half-sib family. Accuracy (defined as R2) and performance of this approach were evaluated by using simulated and real datasets. Phasing results for the half-sibs were benchmarked to other commonly used phasing programs - AlphaPhase, BEAGLE and PedPhase 3.
Using a simulated dataset with 20 markers per cM, and for a half-sib family size of 4 and 40, the accuracy of block detection, was 0.58 and 0.96, respectively. The accuracy of inferring sire genotypes was 0.75 and 1.00 and the accuracy of phasing was around 0.97, respectively. hsphase was more robust to genotyping errors than PedPhase 3, AlphaPhase and BEAGLE. Computationally, hsphase was much faster than AlphaPhase and BEAGLE.
In half-sib families of size 8 and above, hsphase can accurately detect block structure of paternal haplotypes, impute genotypes of ungenotyped sires and reconstruct haplotypes in progeny. The method is much faster and more accurate than other widely used population-based phasing programs. A program implementing the method is freely available as an R package (hsphase).
识别重组事件和构成配子的染色体片段对于基因组分析的许多应用都很有用。在畜牧业中,通常可以获得半同胞家系的基因型数据。我们提出了一种简单但计算效率高的方法,利用半同胞的单核苷酸多态性标记基因型来重建在父本减数分裂过程中发生的重组和分离事件,从而形成其后代中观察到的单倍型。这些减数分裂事件决定了后代中父本单倍型的块结构,这可以用于相化单个半同胞家系中个体的基因型,在未对父本进行基因分型的情况下推断父本单倍型,或者根据父本的序列数据推断后代序列的父本链。
hsphase 算法利用半同胞中相对同型合子的信息来识别重组事件,以及来自父本和母本链(块)的染色体区域,这些区域被其后代遗传。然后利用这些信息来推断父本的基因型,再利用父本的基因型来相化半同胞家系。通过使用模拟数据集和真实数据集来评估该方法的准确性(定义为 R2)和性能。将半同胞的相化结果与其他常用的相化程序(AlphaPhase、BEAGLE 和 PedPhase 3)进行基准测试。
使用每个 cM 有 20 个标记的模拟数据集,对于 4 个和 40 个半同胞家系大小,块检测的准确性分别为 0.58 和 0.96。推断父本基因型的准确性分别为 0.75 和 1.00,相化的准确性约为 0.97。与 PedPhase 3、AlphaPhase 和 BEAGLE 相比,hsphase 对基因分型错误更稳健。在计算方面,hsphase 比 AlphaPhase 和 BEAGLE 快得多。
在大小为 8 及以上的半同胞家系中,hsphase 可以准确地检测父本单倍型的块结构,推断未基因分型的父本的基因型,并重建后代的单倍型。该方法比其他广泛使用的基于群体的相化程序更快、更准确。实现该方法的程序作为 R 包(hsphase)免费提供。