Kingston Hanley, Stilp Adrienne M, Gordon William, Broome Jai, Gogarten Stephanie M, Ling Hua, Barnard John, Dugan-Perez Shannon, Ellinor Patrick T, Gabriel Stacey, Germer Soren, Gibbs Richard A, Gupta Namrata, Rice Kenneth, Smith Albert V, Zody Michael C, Blackman Scott M, Cutting Garry, Knowles Michael R, Zhou Yi-Hui, Rosenfeld Margaret, Gibson Ronald L, Bamshad Michael, Fohner Alison, Blue Elizabeth E
Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, USA.
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
HGG Adv. 2022 May 12;3(3):100117. doi: 10.1016/j.xhgg.2022.100117. eCollection 2022 Jul 14.
F508del (c.1521_1523delCTT, p.Phe508delPhe) is the most common pathogenic allele underlying cystic fibrosis (CF), and its frequency varies in a geographic cline across Europe. We hypothesized that genetic variation associated with this cline is overrepresented in a large cohort (N > 5,000) of persons with CF who underwent whole-genome sequencing and that this pattern could result in spurious associations between variants correlated with both the F508del genotype and CF-related outcomes. Using principal-component (PC) analyses, we showed that variation in the region disproportionately contributes to a PC explaining a relatively high proportion of genetic variance. Variation near was correlated with population structure among persons with CF, and this correlation was driven by a subset of the sample inferred to have European ancestry. We performed genome-wide association studies comparing persons with CF with one versus two copies of the F508del allele; this allowed us to identify genetic variation associated with the F508del allele and to determine that standard PC-adjustment strategies eliminated the significant association signals. Our results suggest that PC adjustment can adequately prevent spurious associations between genetic variants and CF-related traits and are therefore effective tools to control for population structure even when population structure is confounded with disease severity and a common pathogenic variant.
F508del(c.1521_1523delCTT,p.Phe508delPhe)是囊性纤维化(CF)最常见的致病等位基因,其频率在欧洲呈地理梯度变化。我们假设,与这种梯度相关的遗传变异在一个接受全基因组测序的大型CF患者队列(N>5000)中被过度代表,并且这种模式可能导致与F508del基因型和CF相关结局均相关的变异之间产生虚假关联。使用主成分(PC)分析,我们表明该区域的变异对一个解释相对高比例遗传方差的主成分有不成比例的贡献。该区域附近的变异与CF患者的群体结构相关,并且这种相关性是由推断具有欧洲血统的样本子集驱动的。我们进行了全基因组关联研究,比较了携带一个与两个F508del等位基因拷贝的CF患者;这使我们能够识别与F508del等位基因相关的遗传变异,并确定标准的PC调整策略消除了显著的关联信号。我们的结果表明,PC调整可以充分防止遗传变异与CF相关性状之间的虚假关联,因此即使群体结构与疾病严重程度和常见致病变异混淆时,也是控制群体结构的有效工具。