• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

全球人类群体的插补准确性。

Imputation accuracy across global human populations.

作者信息

Cahoon Jordan L, Rui Xinyue, Tang Echo, Simons Christopher, Langie Jalen, Chen Minhui, Lo Ying-Chu, Chiang Charleston W K

机构信息

Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA; Department of Computer Science, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA.

Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, Los Angeles, CA 90033, USA.

出版信息

Am J Hum Genet. 2024 May 2;111(5):979-989. doi: 10.1016/j.ajhg.2024.03.011. Epub 2024 Apr 10.

DOI:10.1016/j.ajhg.2024.03.011
PMID:38604166
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11080279/
Abstract

Genotype imputation is now fundamental for genome-wide association studies but lacks fairness due to the underrepresentation of references from non-European ancestries. The state-of-the-art imputation reference panel released by the Trans-Omics for Precision Medicine (TOPMed) initiative improved the imputation of admixed African-ancestry and Hispanic/Latino samples, but imputation for populations primarily residing outside of North America may still fall short in performance due to persisting underrepresentation. To illustrate this point, we imputed the genotypes of over 43,000 individuals across 123 populations around the world and identified numerous populations where imputation accuracy paled in comparison to that of European-ancestry populations. For instance, the mean imputation r-squared (Rsq) for variants with minor allele frequencies between 1% and 5% in Saudi Arabians (n = 1,061), Vietnamese (n = 1,264), Thai (n = 2,435), and Papua New Guineans (n = 776) were 0.79, 0.78, 0.76, and 0.62, respectively, compared to 0.90-0.93 for comparable European populations matched in sample size and SNP array content. Outside of Africa and Latin America, Rsq appeared to decrease as genetic distances to European-ancestry reference increased, as predicted. Using sequencing data as ground truth, we also showed that Rsq may over-estimate imputation accuracy for non-European populations more than European populations, suggesting further disparity in accuracy between populations. Using 1,496 sequenced individuals from Taiwan Biobank as a second reference panel to TOPMed, we also assessed a strategy to improve imputation for non-European populations with meta-imputation, but this design did not improve accuracy across frequency spectra. Taken together, our analyses suggest that we must ultimately strive to increase diversity and size to promote equity within genetics research.

摘要

基因型填充如今已成为全基因组关联研究的基础,但由于非欧洲血统参考数据的代表性不足,该方法缺乏公平性。精准医学跨组学(TOPMed)计划发布的最新填充参考面板改善了对非洲血统混合样本和西班牙裔/拉丁裔样本的填充效果,但对于主要居住在北美以外地区的人群,由于代表性持续不足,其填充性能可能仍有欠缺。为说明这一点,我们对全球123个群体中超过43,000人的基因型进行了填充,并识别出许多群体,其填充准确性与欧洲血统群体相比显得逊色。例如,沙特阿拉伯人(n = 1,061)、越南人(n = 1,264)、泰国人(n = 2,435)和巴布亚新几内亚人(n = 776)中,次要等位基因频率在1%至5%之间的变异的平均填充r平方(Rsq)分别为0.79、0.78、0.76和0.62,而样本量和单核苷酸多态性(SNP)阵列内容匹配的可比欧洲群体的这一数值为0.90 - 0.93。正如预期的那样,在非洲和拉丁美洲以外地区,随着与欧洲血统参考的遗传距离增加,Rsq似乎会降低。以测序数据作为基准事实,我们还表明,Rsq对非欧洲群体填充准确性的高估可能超过欧洲群体,这表明不同群体之间在准确性上存在进一步差异。我们还使用来自台湾生物银行的1,496名测序个体作为TOPMed的第二个参考面板,评估了一种通过元填充来改善非欧洲群体填充的策略,但该设计并未在整个频率谱上提高准确性。综合来看,我们的分析表明,我们最终必须努力增加多样性和样本量,以促进遗传学研究中的公平性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a09e/11080279/89c49f6fbfcc/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a09e/11080279/89c49f6fbfcc/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a09e/11080279/89c49f6fbfcc/fx1.jpg

相似文献

1
Imputation accuracy across global human populations.全球人类群体的插补准确性。
Am J Hum Genet. 2024 May 2;111(5):979-989. doi: 10.1016/j.ajhg.2024.03.011. Epub 2024 Apr 10.
2
Imputation Accuracy Across Global Human Populations.全球人类群体的插补准确性。
bioRxiv. 2023 Oct 26:2023.05.22.541241. doi: 10.1101/2023.05.22.541241.
3
Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations.超过 10 万 NHLBI 转化医学精准医学(TOPMed)联盟全基因组序列的使用提高了混合非裔和西班牙裔/拉丁裔人群中罕见变异关联的推断质量和检测能力。
PLoS Genet. 2019 Dec 23;15(12):e1008500. doi: 10.1371/journal.pgen.1008500. eCollection 2019 Dec.
4
The power of TOPMed imputation for the discovery of Latino-enriched rare variants associated with type 2 diabetes.TOPMed 插补在发现与 2 型糖尿病相关的拉丁裔丰富罕见变异中的作用。
Diabetologia. 2023 Jul;66(7):1273-1288. doi: 10.1007/s00125-023-05912-9. Epub 2023 May 6.
5
Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels.多祖源参考面板中小遗传背景的基因型推断准确性和质量指标。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad509.
6
Extent to which array genotyping and imputation with large reference panels approximate deep whole-genome sequencing.基于大型参考面板的基因分型和模拟深度全基因组测序的程度。
Am J Hum Genet. 2022 Sep 1;109(9):1653-1666. doi: 10.1016/j.ajhg.2022.07.012. Epub 2022 Aug 17.
7
Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs.利用数千个特定研究的全基因组序列进行罕见变异基因型填充:对具有成本效益的研究设计的影响。
Eur J Hum Genet. 2015 Jul;23(7):975-83. doi: 10.1038/ejhg.2014.216. Epub 2014 Oct 8.
8
Genotype imputation performance of three reference panels using African ancestry individuals.三种参考面板在非洲血统个体中的基因型推断性能。
Hum Genet. 2018 Apr;137(4):281-292. doi: 10.1007/s00439-018-1881-4. Epub 2018 Apr 10.
9
Genotype imputation of Metabochip SNPs using a study-specific reference panel of ~4,000 haplotypes in African Americans from the Women's Health Initiative.使用来自妇女健康倡议的约 4000 个非洲裔美国人的研究特定参考面板对 Metabochip SNPs 进行基因型推断。
Genet Epidemiol. 2012 Feb;36(2):107-17. doi: 10.1002/gepi.21603.
10
A generic coalescent-based framework for the selection of a reference panel for imputation.基于泛凝聚的参考面板选择方法用于 imputation。
Genet Epidemiol. 2010 Dec;34(8):773-82. doi: 10.1002/gepi.20505.

引用本文的文献

1
Establishing Best Practices for Clinical GWAS: Tackling Imputation and Data Quality Challenges.建立临床全基因组关联研究的最佳实践:应对基因填充和数据质量挑战。
Int J Mol Sci. 2025 Jul 3;26(13):6397. doi: 10.3390/ijms26136397.
2
Mitochondrial ancestry from complete mitogenomes highlights a lack of characterization of indigenous haplogroups in Brazilian Amazon population.来自完整线粒体基因组的线粒体谱系凸显了巴西亚马逊人群中本土单倍群缺乏特征描述的情况。
Commun Biol. 2025 May 30;8(1):835. doi: 10.1038/s42003-025-08126-4.
3
Whole genome sequencing analysis of body mass index identifies novel African ancestry-specific risk allele.

本文引用的文献

1
The genetic legacy of the expansion of Bantu-speaking peoples in Africa.非洲班图语民族扩张的遗传遗产。
Nature. 2024 Jan;625(7995):540-547. doi: 10.1038/s41586-023-06770-6. Epub 2023 Nov 29.
2
Genotyping, sequencing and analysis of 140,000 adults from Mexico City.对来自墨西哥城的14万名成年人进行基因分型、测序和分析。
Nature. 2023 Oct;622(7984):784-793. doi: 10.1038/s41586-023-06595-3. Epub 2023 Oct 11.
3
A whole-genome reference panel of 14,393 individuals for East Asian populations accelerates discovery of rare functional variants.
体重指数的全基因组测序分析确定了新的非洲血统特异性风险等位基因。
Nat Commun. 2025 Apr 11;16(1):3470. doi: 10.1038/s41467-025-58420-2.
4
Editorial: Advancements and prospects of genome-wide association studies.社论:全基因组关联研究的进展与前景
Front Genet. 2025 Feb 19;16:1564006. doi: 10.3389/fgene.2025.1564006. eCollection 2025.
5
Patterns of population structure and genetic variation within the Saudi Arabian population.沙特阿拉伯人群的人口结构和遗传变异模式。
bioRxiv. 2025 Jan 13:2025.01.10.632500. doi: 10.1101/2025.01.10.632500.
6
The impact of Indigenous American-like ancestry on risk of acute lymphoblastic leukemia in Hispanic/Latino children.美洲原住民样血统对西班牙裔/拉丁裔儿童急性淋巴细胞白血病风险的影响。
medRxiv. 2025 Jan 15:2025.01.14.25320563. doi: 10.1101/2025.01.14.25320563.
7
SEAD reference panel with 22,134 haplotypes boosts rare variant imputation and genome-wide association analysis in Asian populations.拥有22,134个单倍型的SEAD参考面板增强了亚洲人群中罕见变异的归因和全基因组关联分析。
Nat Commun. 2024 Dec 30;15(1):10839. doi: 10.1038/s41467-024-55147-4.
8
The predictive capacity of polygenic risk scores for disease risk is only moderately influenced by imputation panels tailored to the target population.多基因风险评分对疾病风险的预测能力仅受到针对目标人群定制的 imputation 面板的适度影响。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae036.
一个包含 14393 个人的全基因组参考面板,用于东亚人群,加速了罕见功能变异的发现。
Sci Adv. 2023 Aug 9;9(32):eadg6319. doi: 10.1126/sciadv.adg6319.
4
Performance and accuracy evaluation of reference panels for genotype imputation in sub-Saharan African populations.撒哈拉以南非洲人群中基因型填补参考面板的性能与准确性评估
Cell Genom. 2023 May 23;3(6):100332. doi: 10.1016/j.xgen.2023.100332. eCollection 2023 Jun 14.
5
Taiwan Biobank: A rich biomedical research database of the Taiwanese population.台湾生物银行:一个关于台湾人群的丰富生物医学研究数据库。
Cell Genom. 2022 Oct 12;2(11):100197. doi: 10.1016/j.xgen.2022.100197. eCollection 2022 Nov 9.
6
Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing.人类参考基因组构建体之间的反向基因组区域会影响基因分型准确性,并降低关联测试的功效。
HGG Adv. 2022 Nov 11;4(1):100159. doi: 10.1016/j.xhgg.2022.100159. eCollection 2023 Jan 12.
7
MagicalRsq: Machine-learning-based genotype imputation quality calibration.MagicalRsq:基于机器学习的基因型数据质量校准。
Am J Hum Genet. 2022 Nov 3;109(11):1986-1997. doi: 10.1016/j.ajhg.2022.09.009. Epub 2022 Oct 4.
8
Testing the generalizability of ancestry-specific polygenic risk scores to predict prostate cancer in sub-Saharan Africa.检测基于祖源的多基因风险评分在撒哈拉以南非洲预测前列腺癌的泛化能力。
Genome Biol. 2022 Sep 13;23(1):194. doi: 10.1186/s13059-022-02766-z.
9
Increasing diversity in genomics requires investment in equitable partnerships and capacity building.增加基因组学的多样性需要投资于公平的伙伴关系和能力建设。
Nat Genet. 2022 Jun;54(6):740-745. doi: 10.1038/s41588-022-01095-y.
10
Meta-imputation: An efficient method to combine genotype data after imputation with multiple reference panels.元导入:一种利用多个参考面板对导入后的基因型数据进行合并的有效方法。
Am J Hum Genet. 2022 Jun 2;109(6):1007-1015. doi: 10.1016/j.ajhg.2022.04.002. Epub 2022 May 3.