Avera Institute for Human Genetics, Avera McKennan Hospital and University Health Center, Sioux Falls, SD 57105, USA.
Department of Biological Psychology, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands.
Genes (Basel). 2023 Jul 22;14(7):1497. doi: 10.3390/genes14071497.
Accurate inference of genetic ancestry is crucial for population-based association studies, accounting for population heterogeneity and structure. This study analyzes genome-wide SNP data from the Netherlands Twin Register to compare genetic ancestry estimates. The focus is on the comparison of ancestry estimates between family members and individuals genotyped on multiple arrays (Affymetrix 6.0, Affymetrix Axiom, and Illumina GSA). Two conventional methods, principal component analysis and ADMIXTURE, were implemented to estimate ancestry, each serving its specific purpose, rather than for direct comparison. The results reveal that as the degree of genetic relatedness decreases, the Euclidean distances of genetic ancestry estimates between family members significantly increase (empirical < 0.001), regardless of the estimation method and genotyping array. Ancestry estimates among individuals genotyped on multiple arrays also show statistically significant differences (empirical < 0.001). Additionally, this study investigates the relationship between the ancestry estimates of non-identical twin offspring with ancestrally diverse parents and those with ancestrally similar parents. The results indicate a statistically significant weak correlation between the variation in ancestry estimates among offspring and differences in ancestry estimates among parents (Spearman's rho: 0.07, = 0.005). This study highlights the utility of current methods in inferring genetic ancestry, emphasizing the importance of reference population composition in determining ancestry estimates.
准确推断遗传血统对于基于人群的关联研究至关重要,因为它可以解释人群异质性和结构。本研究分析了荷兰双胞胎登记处的全基因组 SNP 数据,以比较遗传血统估计值。研究重点是比较家庭成员和在多个数组(Affymetrix 6.0、Affymetrix Axiom 和 Illumina GSA)上进行基因分型的个体之间的血统估计值。本研究使用了两种传统方法,即主成分分析和 ADMIXTURE,来估计血统,这两种方法各有其特定用途,而不是直接进行比较。结果表明,随着遗传相关性的降低,家庭成员之间遗传血统估计值的欧几里得距离显著增加(经验 < 0.001),无论使用哪种估计方法和基因分型数组。在多个数组上进行基因分型的个体之间的血统估计值也存在统计学上的显著差异(经验 < 0.001)。此外,本研究还调查了具有不同祖先背景的父母和具有相似祖先背景的父母的非双胞胎后代的血统估计值之间的关系。结果表明,后代之间血统估计值的变化与父母之间血统估计值的差异之间存在统计学上的弱相关性(Spearman 的 rho:0.07, = 0.005)。本研究强调了当前方法在推断遗传血统方面的实用性,并强调了参考人群组成在确定血统估计值方面的重要性。