Suppr超能文献

通过单倍型估计遗传混合比例。

Estimation of genetic admixture proportions via haplotypes.

作者信息

Ko Seyoon, Sobel Eric M, Zhou Hua, Lange Kenneth

机构信息

Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.

Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA.

出版信息

Comput Struct Biotechnol J. 2024 Dec 6;23:4384-4395. doi: 10.1016/j.csbj.2024.11.043. eCollection 2024 Dec.

Abstract

Estimation of ancestral admixture is essential for creating personal genealogies, studying human history, and conducting genome-wide association studies (GWAS). The following three primary methods exist for estimating admixture coefficients. The frequentist approach directly maximizes the binomial loglikelihood. The Bayesian approach adds a reasonable prior and samples the posterior distribution. Finally, the nonparametric approach decomposes the genotype matrix algebraically. Each approach scales successfully to datasets with a million individuals and a million single nucleotide polymorphisms (SNPs). Despite their variety, all current approaches assume independence between SNPs. To achieve independence requires performing LD (linkage disequilibrium) filtering before analysis. Unfortunately, this tactic loses valuable information and usually retains many SNPs still in LD. The present paper explores the option of explicitly incorporating haplotypes in ancestry estimation. Our program, HaploADMIXTURE, operates on adjacent SNP pairs and jointly estimates their haplotype frequencies along with admixture coefficients. This more complex strategy takes advantage of the rich information available in haplotypes and ultimately yields better admixture estimates and better clustering of real populations in curated datasets.

摘要

估计祖先混合比例对于创建个人族谱、研究人类历史以及开展全基因组关联研究(GWAS)至关重要。目前存在以下三种估计混合系数的主要方法。频率论方法直接最大化二项对数似然。贝叶斯方法添加一个合理的先验并对后验分布进行采样。最后,非参数方法通过代数方式分解基因型矩阵。每种方法都能成功扩展到包含百万个体和百万单核苷酸多态性(SNP)的数据集。尽管方法多样,但目前所有方法都假定SNP之间相互独立。为实现独立性,需要在分析前进行连锁不平衡(LD)过滤。不幸的是,这种策略会丢失有价值的信息,并且通常会保留许多仍处于连锁不平衡状态的SNP。本文探讨了在祖先估计中明确纳入单倍型的选项。我们的程序HaploADMIXTURE对相邻SNP对进行操作,并联合估计它们的单倍型频率以及混合系数。这种更复杂的策略利用了单倍型中可用的丰富信息,最终产生更好的混合估计,并在经过整理的数据集中对真实人群进行更好的聚类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f696/11683265/af2e891019c2/gr001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验