Suppr超能文献

一种基于混合模型和主成分分析的群体结构似然无估计方法。

A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis.

机构信息

Program in Applied and Computational Mathematics, Princeton University, New Jersey 08544

Lewis-Sigler Institute for Integrative Genomics, Princeton University, New Jersey 08544

出版信息

Genetics. 2019 Aug;212(4):1009-1029. doi: 10.1534/genetics.119.302159. Epub 2019 Apr 26.

Abstract

We introduce a simple and computationally efficient method for fitting the admixture model of genetic population structure, called ALStructure The strategy of ALStructure is to first estimate the low-dimensional linear subspace of the population admixture components, and then search for a model within this subspace that is consistent with the admixture model's natural probabilistic constraints. Central to this strategy is the observation that all models belonging to this constrained space of solutions are risk-minimizing and have equal likelihood, rendering any additional optimization unnecessary. The low-dimensional linear subspace is estimated through a recently introduced principal components analysis method that is appropriate for genotype data, thereby providing a solution that has both principal components and probabilistic admixture interpretations. Our approach differs fundamentally from other existing methods for estimating admixture, which aim to fit the admixture model directly by searching for parameters that maximize the likelihood function or the posterior probability. We observe that ALStructure typically outperforms existing methods both in accuracy and computational speed under a wide array of simulated and real human genotype datasets. Throughout this work, we emphasize that the admixture model is a special case of a much broader class of models for which algorithms similar to ALStructure may be successfully employed.

摘要

我们介绍了一种简单且计算效率高的方法,用于拟合遗传群体结构的混合模型,称为 ALStructure。ALStructure 的策略是首先估计群体混合成分的低维线性子空间,然后在该子空间内搜索与混合模型的自然概率约束一致的模型。该策略的核心是观察到所有属于这个有约束的解空间的模型都是风险最小化的,并且具有相同的可能性,因此不需要进行任何额外的优化。低维线性子空间是通过最近引入的一种适用于基因型数据的主成分分析方法来估计的,从而提供了一种既有主成分又有概率混合解释的解决方案。我们的方法与其他现有的混合估计方法有根本的不同,后者通过搜索最大化似然函数或后验概率的参数来直接拟合混合模型。我们观察到,在广泛的模拟和真实人类基因型数据集下,ALStructure 在准确性和计算速度方面通常优于现有的方法。在整个工作中,我们强调混合模型是一个更广泛的模型类别的特例,类似 ALStructure 的算法可以成功地应用于这些模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c55/6707457/2375405eb091/1009f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验