Honorato-Mauer Jessica, Shah Nirav N, Maihofer Adam X, Zai Clement C, Belangero Sintia, Nievergelt Caroline M, Santoro Marcos, Atkinson Elizabeth G
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
Department of Psychiatry, School of Medicine, University of California at San Diego, La Jolla, CA 92093, USA.
Am J Hum Genet. 2025 Feb 6;112(2):224-234. doi: 10.1016/j.ajhg.2024.12.005. Epub 2025 Jan 2.
In recent years, significant efforts have been made to improve methods for genomic studies of admixed populations using local ancestry inference (LAI). Accurate LAI is crucial to ensure that downstream analyses accurately reflect the genetic ancestry of research participants. Here, we test analytic strategies for LAI to provide guidelines for optimal accuracy, focusing on admixed populations reflective of Latin America's primary continental ancestries-African (AFR), Amerindigenous (AMR), and European (EUR). Simulating linkage-disequilibrium-informed admixed haplotypes under a variety of 2- and 3-way admixture models, we implemented a standard LAI pipeline, testing the impact of reference panel composition, DNA data type, demography, and software parameters to quantify ancestry-specific LAI accuracy. We observe that across all models, AMR tracts have notably reduced LAI accuracy as compared to EUR and AFR tracts, with true positive rate means for AMR ranging from 88% to 94%, EUR from 96% to 99%, and AFR from 98% to 99%. When LAI miscalls occurred, they most frequently erroneously called EUR ancestry in true AMR sites. Concerning reference panel curation, we find that using a reference panel well matched to the target population, even with a smaller sample size, was accurate and the most computationally efficient. Imputation did not harm LAI performance in our tests; rather, we observed that higher variant density improved accuracy. While directly responsive to admixed Latin American cohort compositions, these trends are broadly useful for informing best practices for LAI across admixed populations. Our findings reinforce the need for the inclusion of more underrepresented populations in sequencing efforts to improve reference panels.
近年来,人们付出了巨大努力来改进利用局部祖先推断(LAI)对混合人群进行基因组研究的方法。准确的LAI对于确保下游分析准确反映研究参与者的遗传祖先至关重要。在这里,我们测试LAI的分析策略,以提供最佳准确性的指导方针,重点关注反映拉丁美洲主要大陆祖先——非洲(AFR)、美洲原住民(AMR)和欧洲(EUR)的混合人群。在各种二元和三元混合模型下模拟连锁不平衡信息混合单倍型,我们实施了一个标准的LAI流程,测试参考面板组成、DNA数据类型、人口统计学和软件参数对特定祖先LAI准确性的影响。我们观察到,在所有模型中,与EUR和AFR片段相比,AMR片段的LAI准确性显著降低,AMR的真阳性率均值在88%至94%之间,EUR在96%至99%之间,AFR在98%至99%之间。当LAI出现错误调用时,它们最常将真正的AMR位点错误地判定为EUR祖先。关于参考面板的筛选,我们发现使用与目标人群匹配良好的参考面板,即使样本量较小,也是准确且计算效率最高的。在我们的测试中,插补并没有损害LAI的性能;相反,我们观察到更高的变异密度提高了准确性。虽然这些趋势直接适用于混合的拉丁美洲队列组成,但它们对于为跨混合人群的LAI最佳实践提供信息具有广泛的用途。我们的研究结果强化了在测序工作中纳入更多代表性不足人群以改进参考面板的必要性。