Program in Applied and Computational Mathematics, Princeton University, New Jersey 08544
Lewis-Sigler Institute for Integrative Genomics, Princeton University, New Jersey 08544
Genetics. 2019 Aug;212(4):1009-1029. doi: 10.1534/genetics.119.302159. Epub 2019 Apr 26.
We introduce a simple and computationally efficient method for fitting the admixture model of genetic population structure, called ALStructure The strategy of ALStructure is to first estimate the low-dimensional linear subspace of the population admixture components, and then search for a model within this subspace that is consistent with the admixture model's natural probabilistic constraints. Central to this strategy is the observation that all models belonging to this constrained space of solutions are risk-minimizing and have equal likelihood, rendering any additional optimization unnecessary. The low-dimensional linear subspace is estimated through a recently introduced principal components analysis method that is appropriate for genotype data, thereby providing a solution that has both principal components and probabilistic admixture interpretations. Our approach differs fundamentally from other existing methods for estimating admixture, which aim to fit the admixture model directly by searching for parameters that maximize the likelihood function or the posterior probability. We observe that ALStructure typically outperforms existing methods both in accuracy and computational speed under a wide array of simulated and real human genotype datasets. Throughout this work, we emphasize that the admixture model is a special case of a much broader class of models for which algorithms similar to ALStructure may be successfully employed.
我们介绍了一种简单且计算效率高的方法,用于拟合遗传群体结构的混合模型,称为 ALStructure。ALStructure 的策略是首先估计群体混合成分的低维线性子空间,然后在该子空间内搜索与混合模型的自然概率约束一致的模型。该策略的核心是观察到所有属于这个有约束的解空间的模型都是风险最小化的,并且具有相同的可能性,因此不需要进行任何额外的优化。低维线性子空间是通过最近引入的一种适用于基因型数据的主成分分析方法来估计的,从而提供了一种既有主成分又有概率混合解释的解决方案。我们的方法与其他现有的混合估计方法有根本的不同,后者通过搜索最大化似然函数或后验概率的参数来直接拟合混合模型。我们观察到,在广泛的模拟和真实人类基因型数据集下,ALStructure 在准确性和计算速度方面通常优于现有的方法。在整个工作中,我们强调混合模型是一个更广泛的模型类别的特例,类似 ALStructure 的算法可以成功地应用于这些模型。