Suppr超能文献

应用于全球人类研究的结构化群体中基因变异的概率模型。

Probabilistic models of genetic variation in structured populations applied to global human studies.

作者信息

Hao Wei, Song Minsun, Storey John D

机构信息

Lewis-Sigler Institute for Integrative Genomics and.

Lewis-Sigler Institute for Integrative Genomics and Center for Statistics and Machine Learning, Princeton University, Princeton, NJ 08544, USA.

出版信息

Bioinformatics. 2016 Mar 1;32(5):713-21. doi: 10.1093/bioinformatics/btv641. Epub 2015 Nov 6.

Abstract

MOTIVATION

Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral populations for each individual. Here, we instead focus on modeling variation of the genotypes without requiring a higher-level admixture interpretation.

RESULTS

We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis can be utilized to estimate a general model that includes the well-known Pritchard-Stephens-Donnelly admixture model as a special case. Noting some drawbacks of this approach, we introduce a new 'logistic factor analysis' framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the Human Genome Diversity Panel and 1000 Genomes Project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions.

AVAILABILITY AND IMPLEMENTATION

A Bioconductor R package called lfa is available at http://www.bioconductor.org/packages/release/bioc/html/lfa.html

CONTACT

jstorey@princeton.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

现代群体遗传学研究通常涉及对来自不同祖先网络的个体进行全基因组基因分型。一个重要的问题是如何构建和估计能够解释复杂群体结构的观察基因型的概率模型。关于这个问题最突出的工作集中在估计每个个体祖先群体的混合比例模型。在这里,我们转而关注对基因型变异进行建模,而无需更高层次的混合解释。

结果

我们构建了两个通用概率模型,并提出了计算效率高的算法来估计它们。首先,我们展示了如何利用主成分分析来估计一个通用模型,该模型将著名的普里查德 - 斯蒂芬斯 - 唐纳利混合模型作为一个特殊情况包含在内。注意到这种方法的一些缺点,我们引入了一个新的“逻辑因子分析”框架,该框架试图根据捕获群体结构的潜在变量直接对观察基因型背后概率的对数变换进行建模。我们在人类基因组多样性面板和千人基因组计划的数据上展示了这些进展,在那里我们能够识别出在结构上高度分化的单核苷酸多态性(SNP),同时做出最少的建模假设。

可用性与实现

一个名为lfa的Bioconductor R包可在http://www.bioconductor.org/packages/release/bioc/html/lfa.html获取。

联系方式

jstorey@princeton.edu

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9424/4795615/0cb145a03361/btv641f1p.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验