Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706.
Center for Economic and Social Research, University of Southern California, Los Angeles, CA 90089.
Proc Natl Acad Sci U S A. 2024 Oct 29;121(44):e2408715121. doi: 10.1073/pnas.2408715121. Epub 2024 Oct 21.
Epidemiologic associations estimated from observational data are often confounded by genetics due to pervasive pleiotropy among complex traits. Many studies either neglect genetic confounding altogether or rely on adjusting for polygenic scores (PGS) in regression analysis. In this study, we unveil that the commonly employed PGS approach is inadequate for removing genetic confounding due to measurement error and model misspecification. To tackle this challenge, we introduce PENGUIN, a principled framework for polygenic genetic confounding control based on variance component estimation. In addition, we present extensions of this approach that can estimate genetically unconfounded associations using GWAS summary statistics alone as input and between multiple generations of study samples. Through simulations, we demonstrate superior statistical properties of PENGUIN compared to the existing approaches. Applying our method to multiple population cohorts, we reveal and remove substantial genetic confounding in the associations of educational attainment with various complex traits and between parental and offspring education. Our results show that PENGUIN is an effective solution for genetic confounding control in observational data analysis with broad applications in future epidemiologic association studies.
基于观察数据估计的流行病学关联通常会受到遗传因素的混杂,这是由于复杂特征之间普遍存在的多效性。许多研究要么完全忽略遗传混杂,要么依赖于回归分析中多基因评分(PGS)的调整。在这项研究中,我们揭示了常用的 PGS 方法由于测量误差和模型失拟,不足以消除遗传混杂。为了解决这个挑战,我们引入了 PENGUIN,这是一种基于方差分量估计的多基因遗传混杂控制的原则性框架。此外,我们还提出了这种方法的扩展,它可以仅使用 GWAS 汇总统计数据作为输入,并在多个研究样本世代之间估计遗传上无混杂的关联。通过模拟,我们证明了 PENGUIN 与现有方法相比具有优越的统计特性。将我们的方法应用于多个人群队列,我们揭示并消除了教育程度与各种复杂特征之间以及父母与子女教育之间的大量遗传混杂。我们的结果表明,PENGUIN 是观察性数据分析中遗传混杂控制的有效解决方案,在未来的流行病学关联研究中有广泛的应用。