Hou Zhuoran, Ochoa Alejandro
Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA.
Duke Center for Statistical Genetics and Genomics, Duke University, Durham, NC 27705, USA.
bioRxiv. 2025 May 15:2025.05.13.653659. doi: 10.1101/2025.05.13.653659.
Heritability is a fundamental parameter of diseases and other traits, quantifying the contribution of genetics to that trait. Kinship matrices, also known as Genetic Relatedness Matrices or "GRMs", are required for heritability estimation with variance components models. However, the most common "standard" kinship estimator employed by GCTA and other approaches, can be severely biased in structured populations. In this study, we characterize heritability estimation biases in GCTA due to kinship estimation biases under population structure. For the standard (ROM) kinship estimator, we derive a closed-form expression for heritability bias given by the mean kinship value and the true heritability. The standard (MOR) estimator is the most widely used in practice, and exhibits more severe bias than ROM due to upweighing low-frequency variants. Using simulation studies with admixture and family structures, as well as simulated traits from 1000 Genomes genotypes, we find that only Popkin, which is the only unbiased population kinship estimator, produces unbiased heritability estimates in structured settings. Pedigree-only estimates have upward heritability biases when there is population structure. Finally, we analyze three structured datasets with real phenotypes-the San Antonio Family Study, the Hispanic Community Health Study / Study of Latinos, and a multiethnic Nephrotic Syndrome cohort. The standard MOR estimator can produce both downward and upward heritability biases depending on population structure and variant frequency spectrum, compared to the other two estimators. Overall, common kinship estimators result in heritability estimation biases when applied to structured populations, a challenge that Popkin successfully overcomes.
遗传力是疾病和其他性状的一个基本参数,用于量化基因对该性状的贡献。使用方差分量模型估计遗传力需要亲缘关系矩阵,也称为遗传相关矩阵或“GRM”。然而,GCTA和其他方法所采用的最常见的“标准”亲缘关系估计器,在结构化群体中可能会有严重偏差。在本研究中,我们描述了由于群体结构下的亲缘关系估计偏差而导致的GCTA中遗传力估计偏差。对于标准的(ROM)亲缘关系估计器,我们推导出了一个由平均亲缘关系值和真实遗传力给出的遗传力偏差的闭式表达式。标准的(MOR)估计器在实际应用中使用最广泛,并且由于对低频变异的加权过高,其偏差比ROM更严重。通过对混合和家族结构的模拟研究,以及来自千人基因组基因型的模拟性状,我们发现只有Popkin(唯一无偏的群体亲缘关系估计器)在结构化环境中能产生无偏的遗传力估计。当存在群体结构时,仅基于系谱的估计会有向上的遗传力偏差。最后,我们分析了三个具有真实表型的结构化数据集——圣安东尼奥家族研究、西班牙裔社区健康研究/拉丁裔研究以及一个多民族肾病综合征队列。与其他两个估计器相比,标准的MOR估计器根据群体结构和变异频谱可能会产生向下和向上的遗传力偏差。总体而言,常见的亲缘关系估计器在应用于结构化群体时会导致遗传力估计偏差,而Popkin成功克服了这一挑战。