Department of Veterinary and Biomedical Sciences, College of Agricultural Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America.
Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America.
PLoS Genet. 2021 Jun 4;17(6):e1009534. doi: 10.1371/journal.pgen.1009534. eCollection 2021 Jun.
Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041; intergenic region of chromosome 7)-rs4695885 (MAF: 0.34; intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action.
在选择传统遗传编码时,会对单核苷酸多态性 (SNP) 的遗传模型做出假设:加性、显性和隐性。此外,基因组中的 SNPs 不太可能表现出相同的遗传模型。然而,对每种编码组合进行 SNP-SNP 相互作用分析会增加多重检验负担。在这里,我们提出了一种新的灵活遗传相互作用编码方法,即弹性数据驱动遗传编码 (EDGE),其中根据 SNP 在交互测试之前在数据集上表现出的遗传模型,将 SNP 分配为杂合值。我们使用 29 种模拟遗传模型组合评估了 EDGE 检测遗传相互作用的能力,发现它在 10%、30%和 50%的次要等位基因频率 (MAF) 下均优于传统编码方法。此外,EDGE 保持了较低的假阳性率,而加性和显性编码则表现出了膨胀。我们使用电子病历和基因组学 (eMERGE) 网络的遗传数据评估了 EDGE 和传统编码在五个表型中的表现:年龄相关性黄斑变性 (AMD)、年龄相关性白内障、青光眼、2 型糖尿病 (T2D) 和耐药性高血压。使用传统编码对每个表型进行了多编码全基因组关联研究 (GWAS),并考虑了多编码 GWAS 的顶级结果,使用传统编码和 EDGE 进行 SNP-SNP 相互作用。EDGE 确定了一个新的 SNP-SNP 相互作用,这是其他方法都没有发现的:rs7787286(MAF:0.041;7 号染色体上的基因间区域)-rs4695885(MAF:0.34;4 号染色体上的基因间区域),Bonferroni LRT p 值为 0.018。在使用隐性编码的这些 SNP 附近 25kb 的英国生物银行数据中发现了 SNP-SNP 相互作用:rs60374751(MAF:0.030)和 rs6843594(MAF:0.34)(Bonferroni LRT p:0.026)。我们建议使用 EDGE 灵活检测表现出不同作用的 SNPs 之间的相互作用。