Department of Economics, New York University, New York, NY, USA.
Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.
Nat Hum Behav. 2021 Dec;5(12):1744-1758. doi: 10.1038/s41562-021-01119-3. Epub 2021 Jun 17.
Polygenic indexes (PGIs) are DNA-based predictors. Their value for research in many scientific disciplines is growing rapidly. As a resource for researchers, we used a consistent methodology to construct PGIs for 47 phenotypes in 11 datasets. To maximize the PGIs' prediction accuracies, we constructed them using genome-wide association studies-some not previously published-from multiple data sources, including 23andMe and UK Biobank. We present a theoretical framework to help interpret analyses involving PGIs. A key insight is that a PGI can be understood as an unbiased but noisy measure of a latent variable we call the 'additive SNP factor'. Regressions in which the true regressor is this factor but the PGI is used as its proxy therefore suffer from errors-in-variables bias. We derive an estimator that corrects for the bias, illustrate the correction, and make a Python tool for implementing it publicly available.
多基因指数 (PGI) 是基于 DNA 的预测因子。它们在许多科学学科的研究中的价值正在迅速增长。作为研究人员的资源,我们使用一致的方法来构建 11 个数据集 47 个表型的 PGI。为了最大限度地提高 PGI 的预测准确性,我们使用来自多个数据源的全基因组关联研究构建了它们,包括 23andMe 和 UK Biobank。我们提出了一个理论框架来帮助解释涉及 PGI 的分析。一个关键的见解是,PGI 可以被理解为一个潜在变量的无偏但嘈杂的度量,我们称之为“加性 SNP 因子”。因此,在真实回归因子是这个因子但 PGI 被用作其代理的回归中,会存在变量误差偏差。我们推导出一个纠正偏差的估计器,说明了纠正过程,并公开提供了一个用于实现它的 Python 工具。