Heinzel Carola Sophia, Baumdicker Franz, Pfaffelhuber Peter
Department of Mathematical Stochastics, Albert-Ludwigs-University Freiburg, Freiburg im Breisgau 79104, Germany.
Cluster of Excellence "Controlling Microbes to Fight Infections", Mathematical and Computational Population Genetics, University of Tübingen, Sand 14, Tübingen 72076, Germany.
G3 (Bethesda). 2025 Aug 6;15(8). doi: 10.1093/g3journal/jkaf142.
Many ancestry inference tools, including Structure and Admixture, rely on the admixture model to infer both, allele frequencies p and individual admixture proportions q for a collection of individuals relative to a set of hypothetical ancestral populations. We show that under realistic conditions the likelihood in the admixture model is typically flat in some direction around a maximum-likelihood estimate (q^,p^). In particular, the maximum-likelihood estimator is nonunique and there is a complete spectrum of possible estimates. Common inference tools typically identify only a few points within this spectrum. We provide an algorithm which computes the set of equally likely (q,p), when starting from (q^,p^). It is analytic for K=2 ancestral populations and numeric for K>2. We apply our algorithm to data from the 1000 genomes project, and show that inter-European estimators of q can come with a large set of equally likely possibilities. In general, markers with large allele frequency differences between populations in combination with individuals with concentrated admixture proportions lead to small areas with a flat likelihood. Our findings imply that care must be taken when interpreting results from STRUCTURE and ADMIXTURE if populations are not separated well enough.
许多血统推断工具,包括Structure和Admixture,都依赖于混合模型来推断相对于一组假设的祖先群体的个体集合的等位基因频率p和个体混合比例q。我们表明,在现实条件下,混合模型中的似然性在最大似然估计(q^,p^)周围的某些方向上通常是平坦的。特别是,最大似然估计器是非唯一的,并且存在完整的可能估计范围。常见的推断工具通常只识别该范围内的几个点。我们提供了一种算法,当从(q^,p^)开始时,该算法计算等可能的(q,p)集合。对于K = 2个祖先群体,它是解析的,对于K>2则是数值的。我们将我们的算法应用于千人基因组计划的数据,并表明欧洲内部q的估计值可能伴随着大量等可能的可能性。一般来说,群体之间等位基因频率差异大的标记与混合比例集中的个体相结合,会导致似然性平坦的小区域。我们的发现意味着,如果群体分离得不够好,在解释Structure和Admixture的结果时必须谨慎。