Frichot Eric, Mathieu François, Trouillon Théo, Bouchard Guillaume, François Olivier
Université Joseph Fourier Grenoble 1, Centre National de la Recherche Scientifique, Techniques de l'Ingénierie Médicale et de la Complexité - Informatique, Mathématiques et Applications, Grenoble Unité Mixte de Recherche 5525, 38042 Grenoble, France.
Genetics. 2014 Apr;196(4):973-83. doi: 10.1534/genetics.113.160572. Epub 2014 Feb 4.
Inference of individual ancestry coefficients, which is important for population genetic and association studies, is commonly performed using computer-intensive likelihood algorithms. With the availability of large population genomic data sets, fast versions of likelihood algorithms have attracted considerable attention. Reducing the computational burden of estimation algorithms remains, however, a major challenge. Here, we present a fast and efficient method for estimating individual ancestry coefficients based on sparse nonnegative matrix factorization algorithms. We implemented our method in the computer program sNMF and applied it to human and plant data sets. The performances of sNMF were then compared to the likelihood algorithm implemented in the computer program ADMIXTURE. Without loss of accuracy, sNMF computed estimates of ancestry coefficients with runtimes ∼10-30 times shorter than those of ADMIXTURE.
个体祖先系数的推断对群体遗传学和关联研究很重要,通常使用计算量较大的似然算法来进行。随着大量群体基因组数据集的出现,似然算法的快速版本已引起了相当大的关注。然而,降低估计算法的计算负担仍然是一个重大挑战。在此,我们提出了一种基于稀疏非负矩阵分解算法来估计个体祖先系数的快速有效方法。我们在计算机程序sNMF中实现了我们的方法,并将其应用于人类和植物数据集。然后将sNMF的性能与计算机程序ADMIXTURE中实现的似然算法进行比较。在不损失准确性的情况下,sNMF计算祖先系数估计值的运行时间比ADMIXTURE短约10 - 30倍。