Suppr超能文献

在存在亲缘关系的情况下,对群体结构进行稳健推断,以进行血统预测和分层校正。

Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.

作者信息

Conomos Matthew P, Miller Michael B, Thornton Timothy A

机构信息

Department of Biostatistics, University of Washington, Seattle, Washington, 98195, United States of America.

出版信息

Genet Epidemiol. 2015 May;39(4):276-93. doi: 10.1002/gepi.21896. Epub 2015 Mar 23.

Abstract

Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multidimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using 10 (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.

摘要

利用遗传数据进行群体结构推断,是受群体遗传学和遗传关联研究中的各种应用所推动。对于假定研究参与者无亲缘关系的样本,已经提出了几种方法来识别遗传血统差异,包括主成分分析(PCA)、多维尺度分析(MDS)以及用于比例血统估计的基于模型的方法。然而,许多遗传研究包含有一定亲缘关系的个体,而现有的推断遗传血统的方法在相关样本中会失效。我们提出了一种方法,即PC - AiR,用于在存在已知或潜在亲缘关系的情况下进行稳健的群体结构推断。PC - AiR利用基因组筛选数据和一种高效算法,来识别一个无亲缘关系个体的多样化子集,该子集代表了样本中的所有血统。PC - AiR方法直接对所识别的血统代表性子集进行主成分分析,然后基于遗传相似性预测所有其余个体的变异成分。在模拟研究以及对国际人类基因组单体型图计划(HapMap)项目第三阶段真实数据的应用中,我们证明PC - AiR相对于现有方法,在相关样本的群体结构推断方面有显著改进。我们还证明了显著的效率提升,即在各种结构设置下,PC - AiR的单个变异轴比使用广泛使用的PCA和MDS方法的10个(或更多)变异成分,能更好地预测血统。最后,我们说明在存在群体结构和亲缘关系的遗传关联研究中,PC - AiR相对于现有方法可以提供更好的群体分层校正。

相似文献

4
Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies.
Genet Epidemiol. 2012 May;36(4):293-302. doi: 10.1002/gepi.21621. Epub 2012 Apr 16.
5
6
Eigenanalysis of SNP data with an identity by descent interpretation.
Theor Popul Biol. 2016 Feb;107:65-76. doi: 10.1016/j.tpb.2015.09.004. Epub 2015 Oct 23.
7
Genetic Ancestry Inference for Pharmacogenomics.
Methods Mol Biol. 2022;2547:595-609. doi: 10.1007/978-1-0716-2573-6_21.
8
Model-free Estimation of Recent Genetic Relatedness.
Am J Hum Genet. 2016 Jan 7;98(1):127-48. doi: 10.1016/j.ajhg.2015.11.022.
10
Robust relationship inference in genome-wide association studies.
Bioinformatics. 2010 Nov 15;26(22):2867-73. doi: 10.1093/bioinformatics/btq559. Epub 2010 Oct 5.

引用本文的文献

4
Polygenic Scores of Core-1 Alzheimer's Disease Biomarkers Predict Early Cognitive and Pathological Change.
medRxiv. 2025 Jul 14:2025.07.12.25331438. doi: 10.1101/2025.07.12.25331438.
6
Towards improved fine-mapping of candidate causal variants.
Nat Rev Genet. 2025 Jul 28. doi: 10.1038/s41576-025-00869-4.
9
Gene-environment interactions contribute to blood pressure variation across global populations.
medRxiv. 2025 Jul 3:2025.07.02.25330727. doi: 10.1101/2025.07.02.25330727.
10
Extending Genome-Wide Association Studies to admixed cohorts with high degrees of relatedness.
medRxiv. 2025 Jun 9:2025.05.27.25328444. doi: 10.1101/2025.05.27.25328444.

本文引用的文献

1
Local and global ancestry inference and applications to genetic association analysis for admixed populations.
Genet Epidemiol. 2014 Sep;38 Suppl 1(0 1):S5-S12. doi: 10.1002/gepi.21819.
2
The genetical structure of populations.
Ann Eugen. 1951 Mar;15(4):323-54. doi: 10.1111/j.1469-1809.1949.tb02451.x.
3
Advantages and pitfalls in the application of mixed-model association methods.
Nat Genet. 2014 Feb;46(2):100-6. doi: 10.1038/ng.2876.
4
Reconstructing Native American migrations from whole-genome and whole-exome data.
PLoS Genet. 2013;9(12):e1004023. doi: 10.1371/journal.pgen.1004023. Epub 2013 Dec 26.
5
RelateAdmix: a software tool for estimating relatedness between admixed individuals.
Bioinformatics. 2014 Apr 1;30(7):1027-8. doi: 10.1093/bioinformatics/btt652. Epub 2013 Nov 8.
6
Improved ancestry inference using weights from external reference panels.
Bioinformatics. 2013 Jun 1;29(11):1399-406. doi: 10.1093/bioinformatics/btt144. Epub 2013 Mar 28.
7
Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis.
Genet Epidemiol. 2013 Feb;37(2):136-41. doi: 10.1002/gepi.21684. Epub 2012 Sep 19.
8
Principal components analysis of population admixture.
PLoS One. 2012;7(7):e40115. doi: 10.1371/journal.pone.0040115. Epub 2012 Jul 9.
9
Estimating kinship in admixed populations.
Am J Hum Genet. 2012 Jul 13;91(1):122-38. doi: 10.1016/j.ajhg.2012.05.024. Epub 2012 Jun 28.
10
Genome-wide efficient mixed-model analysis for association studies.
Nat Genet. 2012 Jun 17;44(7):821-4. doi: 10.1038/ng.2310.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验