用于病例对照研究中调整的群体结构信息主成分的选择。

Choice of population structure informative principal components for adjustment in a case-control study.

机构信息

Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Avenue, Boston MA 02118, USA.

出版信息

BMC Genet. 2011 Jul 19;12:64. doi: 10.1186/1471-2156-12-64.

DOI:10.1186/1471-2156-12-64

PMID:21771328

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3150322/

Abstract

BACKGROUND

There are many ways to perform adjustment for population structure. It remains unclear what the optimal approach is and whether the optimal approach varies by the type of samples and substructure present. The simplest and most straightforward approach is to adjust for the continuous principal components (PCs) that capture ancestry. Through simulation, we explored the issue of which ancestry informative PCs should be adjusted for in an association model to control for the confounding nature of population structure while maintaining maximum power. A thorough examination of selecting PCs for adjustment in a case-control study across the possible structure scenarios that could occur in a genome-wide association study has not been previously reported.

RESULTS

We found that when the SNP and phenotype frequencies do not vary over the sub-populations, all methods of selection provided similar power and appropriate Type I error for association. When the SNP is not structured and the phenotype has large structure, then selection methods that do not select PCs for inclusion as covariates generally provide the most power. When there is a structured SNP and a non-structured phenotype, selection methods that include PCs in the model have greater power. When both the SNP and the phenotype are structured, all methods of selection have similar power.

CONCLUSIONS

Standard practice is to include a fixed number of PCs in genome-wide association studies. Based on our findings, we conclude that if power is not a concern, then selecting the same set of top PCs for adjustment for all SNPs in logistic regression is a strategy that achieves appropriate Type I error. However, standard practice is not optimal in all scenarios and to optimize power for structured SNPs in the presence of unstructured phenotypes, PCs that are associated with the tested SNP should be included in the logistic model.

摘要

背景

有许多方法可以进行群体结构调整。目前尚不清楚哪种方法是最优的，以及最优方法是否因样本类型和存在的亚结构而异。最简单和最直接的方法是调整捕获祖先的连续主成分（PCs）。通过模拟，我们探讨了在关联模型中应调整哪些与祖先有关的 PC 以控制群体结构的混杂性质，同时保持最大功效的问题。以前没有报道过在全基因组关联研究中可能出现的结构情况下，对病例对照研究中调整 PC 的方法进行全面检查。

结果

我们发现，当 SNP 和表型频率在亚群中不变化时，所有选择方法都提供了相似的功效和适当的关联Ⅰ型错误。当 SNP 没有结构而表型有很大的结构时，不选择 PC 作为协变量的选择方法通常提供最大的功效。当 SNP 是结构的而表型是非结构的时，将 PC 纳入模型的选择方法具有更高的功效。当 SNP 和表型都有结构时，所有选择方法都具有相似的功效。

结论

标准实践是在全基因组关联研究中包含固定数量的 PCs。根据我们的发现，如果功效不是问题，那么在逻辑回归中为所有 SNP 选择相同的一组最佳 PC 进行调整是一种达到适当Ⅰ型错误的策略。然而，在所有情况下，标准实践并不都是最优的，为了优化存在非结构化表型的结构化 SNP 的功效，应将与测试 SNP 相关的 PC 纳入逻辑模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/33ac/3150322/57cc4b4231cf/1471-2156-12-64-1.jpg

相似文献

Choice of population structure informative principal components for adjustment in a case-control study.用于病例对照研究中调整的群体结构信息主成分的选择。

BMC Genet. 2011 Jul 19;12:64. doi: 10.1186/1471-2156-12-64.

Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies.最大化全基因组关联研究中相关表型主成分分析的功效。

Am J Hum Genet. 2014 May 1;94(5):662-76. doi: 10.1016/j.ajhg.2014.03.016. Epub 2014 Apr 17.

Ancestral informative marker selection and population structure visualization using sparse Laplacian eigenfunctions.利用稀疏拉普拉斯特征函数进行祖先信息标记选择和群体结构可视化。

PLoS One. 2010 Nov 4;5(11):e13734. doi: 10.1371/journal.pone.0013734.

Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements?结构化样本关联分析中的主成分回归与线性混合模型：竞争对手还是互补方法？

Genet Epidemiol. 2015 Mar;39(3):149-55. doi: 10.1002/gepi.21879. Epub 2014 Dec 23.

Clustering by genetic ancestry using genome-wide SNP data.基于全基因组 SNP 数据的遗传谱系聚类分析。

BMC Genet. 2010 Dec 9;11:108. doi: 10.1186/1471-2156-11-108.

SNP selection and multidimensional scaling to quantify population structure.单核苷酸多态性（SNP）选择与多维尺度分析以量化群体结构

Genet Epidemiol. 2009 Sep;33(6):488-96. doi: 10.1002/gepi.20401.

Association test based on SNP set: logistic kernel machine based test vs. principal component analysis.基于 SNP 集的关联测试：逻辑核机器测试与主成分分析。

PLoS One. 2012;7(9):e44978. doi: 10.1371/journal.pone.0044978. Epub 2012 Sep 13.

Marbled inflation from population structure in gene-based association studies with rare variants.基于基因关联研究中稀有变异的群体结构导致的大理石状膨胀。

Genet Epidemiol. 2013 Apr;37(3):286-92. doi: 10.1002/gepi.21714. Epub 2013 Mar 6.

Principal-component-based population structure adjustment in the North American Rheumatoid Arthritis Consortium data: impact of single-nucleotide polymorphism set and analysis method.北美类风湿关节炎联盟数据中基于主成分的群体结构调整：单核苷酸多态性集和分析方法的影响

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S108. doi: 10.1186/1753-6561-3-s7-s108.

Adjustment for population stratification via principal components in association analysis of rare variants.基于主成分的群体分层调整在罕见变异关联分析中的应用。

Genet Epidemiol. 2013 Jan;37(1):99-109. doi: 10.1002/gepi.21691. Epub 2012 Oct 12.

引用本文的文献

X chromosome-wide association studies in neurological disorders: uncovering the hidden influence of the X chromosome.神经系统疾病中的X染色体全基因组关联研究：揭示X染色体的潜在影响

Front Genet. 2025 Jul 30;16:1650259. doi: 10.3389/fgene.2025.1650259. eCollection 2025.

radioGWAS links radiome to genome to discover driver genes with somatic mutations for heterogeneous tumor image phenotype in pancreatic cancer.GWAS 关联放射组学与基因组学，以发现胰腺癌中具有体细胞突变的异质肿瘤影像表型的驱动基因。

Sci Rep. 2024 May 29;14(1):12316. doi: 10.1038/s41598-024-62741-5.

A statistical method for image-mediated association studies discovers genes and pathways associated with four brain disorders.一种用于图像介导关联研究的统计方法发现了与四种脑部疾病相关的基因和途径。

Am J Hum Genet. 2024 Jan 4;111(1):48-69. doi: 10.1016/j.ajhg.2023.11.006. Epub 2023 Dec 19.

Population stratification correction using Bayesian shrinkage priors for genetic association studies.基于贝叶斯收缩先验的群体分层校正在遗传关联研究中的应用。

Ann Hum Genet. 2023 Nov;87(6):302-315. doi: 10.1111/ahg.12527. Epub 2023 Sep 28.

Genome-wide mapping of quantitative trait loci in admixed populations using mixed linear model and Bayesian multiple regression analysis.混合线性模型和贝叶斯多元回归分析在混合人群中进行数量性状基因座的全基因组图谱绘制。

Genet Sel Evol. 2018 Jun 19;50(1):32. doi: 10.1186/s12711-018-0402-1.

Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection.常见的精神分裂症等位基因在突变不耐受基因和受强烈背景选择的区域中富集。

Nat Genet. 2018 Mar;50(3):381-389. doi: 10.1038/s41588-018-0059-2. Epub 2018 Feb 26.

The role of rare variants in systolic blood pressure: analysis of ExomeChip data in HyperGEN African Americans.罕见变异在收缩压中的作用：对非洲裔美国人HyperGEN研究中的外显子芯片数据的分析

Hum Hered. 2015;79(1):20-7. doi: 10.1159/000375373.

Practical aspects of genome-wide association interaction analysis.全基因组关联相互作用分析的实践方面

Hum Genet. 2014 Nov;133(11):1343-58. doi: 10.1007/s00439-014-1480-y. Epub 2014 Aug 28.

Fine-scale patterns of population stratification confound rare variant association tests.人群分层的精细模式使稀有变异关联测试产生混淆。

PLoS One. 2013 Jul 4;8(7):e65834. doi: 10.1371/journal.pone.0065834. Print 2013.

Large-scale genotyping identifies a new locus at 22q13.2 associated with female breast size.大规模基因分型确定了与女性乳房大小相关的新位点 22q13.2。

J Med Genet. 2013 Oct;50(10):666-73. doi: 10.1136/jmedgenet-2013-101708. Epub 2013 Jul 3.

本文引用的文献

Adjustment for local ancestry in genetic association analysis of admixed populations.调整混合人群遗传关联分析中的局部祖源。

Bioinformatics. 2011 Mar 1;27(5):670-7. doi: 10.1093/bioinformatics/btq709. Epub 2010 Dec 17.

Interrogating local population structure for fine mapping in genome-wide association studies.全基因组关联研究中精细定位的局域人群结构分析。

Bioinformatics. 2010 Dec 1;26(23):2961-8. doi: 10.1093/bioinformatics/btq560. Epub 2010 Sep 30.

Adjusting for covariates in logistic regression models.在逻辑回归模型中对协变量进行调整。

Genet Epidemiol. 2010 Nov;34(7):769-71; author reply 772. doi: 10.1002/gepi.20526.

Quality control and quality assurance in genotypic data for genome-wide association studies.全基因组关联研究中基因型数据的质量控制和质量保证。

Genet Epidemiol. 2010 Sep;34(6):591-602. doi: 10.1002/gepi.20516.

Quantification of population structure using correlated SNPs by shrinkage principal components.通过收缩主成分利用相关单核苷酸多态性对群体结构进行量化。

Hum Hered. 2010;70(1):9-22. doi: 10.1159/000288706. Epub 2010 Apr 23.

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S108. doi: 10.1186/1753-6561-3-s7-s108.

Tests of association for quantitative traits in nuclear families using principal components to correct for population stratification.利用主成分校正群体分层，对核心家庭中的数量性状进行关联检验。

Ann Hum Genet. 2009 Nov;73(Pt 6):601-13. doi: 10.1111/j.1469-1809.2009.00539.x. Epub 2009 Aug 20.

Impact of population stratification on family-based association tests with longitudinal measurements.群体分层对基于家系的纵向测量关联检验的影响。

Stat Appl Genet Mol Biol. 2009;8(1):Article 17. doi: 10.2202/1544-6115.1398. Epub 2009 Feb 12.

Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment.使用基于距离的回归进行遗传背景比较及其在群体分层评估与调整中的应用。

Genet Epidemiol. 2009 Jul;33(5):432-41. doi: 10.1002/gepi.20396.

Population substructure and control selection in genome-wide association studies.全基因组关联研究中的群体亚结构与对照选择

PLoS One. 2008 Jul 2;3(7):e2551. doi: 10.1371/journal.pone.0002551.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于病例对照研究中调整的群体结构信息主成分的选择。

Choice of population structure informative principal components for adjustment in a case-control study.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献