Veller Carl, Coop Graham
Department of Evolution and Ecology, and Center for Population Biology, University of California, Davis, CA 95616.
bioRxiv. 2023 Feb 27:2023.02.26.530052. doi: 10.1101/2023.02.26.530052.
A central aim of genome-wide association studies (GWASs) is to estimate direct genetic effects: the causal effects on an individual's phenotype of the alleles that they carry. However, estimates of direct effects can be subject to genetic and environmental confounding, and can also absorb the 'indirect' genetic effects of relatives' genotypes. Recently, an important development in controlling for these confounds has been the use of within-family GWASs, which, because of the randomness of Mendelian segregation within pedigrees, are often interpreted as producing unbiased estimates of direct effects. Here, we present a general theoretical analysis of the influence of confounding in standard population-based and within-family GWASs. We show that, contrary to common interpretation, family-based estimates of direct effects can be biased by genetic confounding. In humans, such biases will often be small per-locus, but can be compounded when effect size estimates are used in polygenic scores. We illustrate the influence of genetic confounding on population- and family-based estimates of direct effects using models of assortative mating, population stratification, and stabilizing selection on GWAS traits. We further show how family-based estimates of indirect genetic effects, based on comparisons of parentally transmitted and untransmitted alleles, can suffer substantial genetic confounding. In addition to known biases that can arise in family-based GWASs when interactions between family members are ignored, we show that biases can also arise from gene-by-environment (G×E) interactions when parental genotypes are not distributed identically across interacting environmental and genetic backgrounds. We conclude that, while family-based studies have placed GWAS estimation on a more rigorous footing, they carry subtle issues of interpretation that arise from confounding and interactions.
全基因组关联研究(GWAS)的一个核心目标是估计直接遗传效应:个体所携带的等位基因对其表型的因果效应。然而,直接效应的估计可能会受到遗传和环境混杂因素的影响,并且还可能包含亲属基因型的“间接”遗传效应。最近,在控制这些混杂因素方面的一项重要进展是采用家系内GWAS,由于系谱内孟德尔分离的随机性,这种方法通常被认为能够产生直接效应的无偏估计。在此,我们对标准的基于人群的GWAS和家系内GWAS中混杂因素的影响进行了一般性理论分析。我们发现,与普遍的解释相反,基于家系的直接效应估计可能会受到遗传混杂因素的偏倚。在人类中,这种偏倚通常在每个基因座上较小,但当效应大小估计用于多基因评分时,可能会累积。我们使用关于GWAS性状的选型交配、群体分层和稳定选择模型,说明了遗传混杂因素对基于人群和家系的直接效应估计的影响。我们进一步表明,基于亲代传递和未传递等位基因比较的家系间接遗传效应估计,可能会受到严重的遗传混杂影响。除了在忽略家庭成员之间相互作用时基于家系的GWAS中可能出现的已知偏倚外,我们还表明,当亲代基因型在相互作用的环境和遗传背景中分布不同时,基因-环境(G×E)相互作用也可能导致偏倚。我们得出结论,虽然基于家系的研究使GWAS估计更加严谨,但它们也存在因混杂和相互作用而产生的微妙解释问题。