Gonen Serap, Ros-Freixedes Roger, Battagin Mara, Gorjanc Gregor, Hickey John M
The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.
Genet Sel Evol. 2017 May 18;49(1):47. doi: 10.1186/s12711-017-0322-5.
This paper describes a method, called AlphaSeqOpt, for the allocation of sequencing resources in livestock populations with existing phased genomic data to maximise the ability to phase and impute sequenced haplotypes into the whole population.
We present two algorithms. The first selects focal individuals that collectively represent the maximum possible portion of the haplotype diversity in the population. The second allocates a fixed sequencing budget among the families of focal individuals to enable phasing of their haplotypes at the sequence level. We tested the performance of the two algorithms in simulated pedigrees. For each pedigree, we evaluated the proportion of population haplotypes that are carried by the focal individuals and compared our results to a variant of the widely-used key ancestors approach and to two haplotype-based approaches. We calculated the expected phasing accuracy of the haplotypes of a focal individual at the sequence level given the proportion of the fixed sequencing budget allocated to its family.
AlphaSeqOpt maximises the ability to capture and phase the most frequent haplotypes in a population in three ways. First, it selects focal individuals that collectively represent a larger portion of the population haplotype diversity than existing methods. Second, it selects focal individuals from across the pedigree whose haplotypes can be easily phased using family-based phasing and imputation algorithms, thus maximises the ability to impute sequence into the rest of the population. Third, it allocates more of the fixed sequencing budget to focal individuals whose haplotypes are more frequent in the population than to focal individuals whose haplotypes are less frequent. Unlike existing methods, we additionally present an algorithm to allocate part of the sequencing budget to the families (i.e. immediate ancestors) of focal individuals to ensure that their haplotypes can be phased at the sequence level, which is essential for enabling and maximising subsequent sequence imputation.
We present a new method for the allocation of a fixed sequencing budget to focal individuals and their families such that the final sequenced haplotypes, when phased at the sequence level, represent the maximum possible portion of the haplotype diversity in the population that can be sequenced and phased at that budget.
本文描述了一种名为AlphaSeqOpt的方法,用于在具有现有分型基因组数据的家畜群体中分配测序资源,以最大限度地提高将测序单倍型分型和推算到整个群体中的能力。
我们提出了两种算法。第一种算法选择能够共同代表群体中单倍型多样性最大可能部分的核心个体。第二种算法在核心个体的家系中分配固定的测序预算,以便在序列水平上对其单倍型进行分型。我们在模拟系谱中测试了这两种算法的性能。对于每个系谱,我们评估了核心个体所携带的群体单倍型的比例,并将我们的结果与广泛使用的关键祖先方法的一个变体以及两种基于单倍型的方法进行比较。我们根据分配给其家系的固定测序预算的比例,计算了核心个体单倍型在序列水平上的预期分型准确性。
AlphaSeqOpt通过三种方式最大限度地提高了捕获和分型群体中最常见单倍型的能力。首先,它选择的核心个体共同代表的群体单倍型多样性比例比现有方法更大。其次,它从整个系谱中选择核心个体,其单倍型可以使用基于家系的分型和推算算法轻松分型,从而最大限度地提高了将序列推算到群体其他个体中的能力。第三,它将更多的固定测序预算分配给群体中单倍型更常见的核心个体,而不是单倍型较不常见的核心个体。与现有方法不同,我们还提出了一种算法,将部分测序预算分配给核心个体的家系(即直系祖先),以确保其单倍型能够在序列水平上进行分型,这对于实现并最大化后续的序列推算至关重要。
我们提出了一种新方法,用于将固定的测序预算分配给核心个体及其家系,使得最终在序列水平上分型的测序单倍型能够代表在该预算下可测序和分型的群体单倍型多样性的最大可能部分。