Woo Hyung Jun, Yu Chenggang, Kumar Kamal, Gold Bert, Reifman Jaques
Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, USA.
Laboratory of Genomic Diversity, National Cancer Institute, Frederick, Maryland, USA.
BMC Genomics. 2016 Aug 30;17(1):695. doi: 10.1186/s12864-016-2871-3.
Genome-wide association studies provide important insights to the genetic component of disease risks. However, an existing challenge is how to incorporate collective effects of interactions beyond the level of independent single nucleotide polymorphism (SNP) tests. While methods considering each SNP pair separately have provided insights, a large portion of expected heritability may reside in higher-order interaction effects.
We describe an inference approach (discrete discriminant analysis; DDA) designed to probe collective interactions while treating both genotypes and phenotypes as random variables. The genotype distributions in case and control groups are modeled separately based on empirical allele frequency and covariance data, whose differences yield disease risk parameters. We compared pairwise tests and collective inference methods, the latter based both on DDA and logistic regression. Analyses using simulated data demonstrated that significantly higher sensitivity and specificity can be achieved with collective inference in comparison to pairwise tests, and with DDA in comparison to logistic regression. Using age-related macular degeneration (AMD) data, we demonstrated two possible applications of DDA. In the first application, a genome-wide SNP set is reduced into a small number (∼100) of variants via filtering and SNP pairs with significant interactions are identified. We found that interactions between SNPs with highest AMD association were epigenetically active in the liver, adipocytes, and mesenchymal stem cells. In the other application, multiple groups of SNPs were formed from the genome-wide data and their relative strengths of association were compared using cross-validation. This analysis allowed us to discover novel collections of loci for which interactions between SNPs play significant roles in their disease association. In particular, we considered pathway-based groups of SNPs containing up to ∼10, 000 variants in each group. In addition to pathways related to complement activation, our collective inference pointed to pathway groups involved in phospholipid synthesis, oxidative stress, and apoptosis, consistent with the AMD pathogenesis mechanism where the dysfunction of retinal pigment epithelium cells plays central roles.
The simultaneous inference of collective interaction effects within a set of SNPs has the potential to reveal novel aspects of disease association.
全基因组关联研究为疾病风险的遗传成分提供了重要见解。然而,一个现存的挑战是如何在独立单核苷酸多态性(SNP)检测水平之上纳入相互作用的集体效应。虽然分别考虑每个SNP对的方法提供了一些见解,但很大一部分预期遗传力可能存在于高阶相互作用效应中。
我们描述了一种推理方法(离散判别分析;DDA),该方法旨在将基因型和表型都视为随机变量的同时探究集体相互作用。病例组和对照组的基因型分布基于经验等位基因频率和协方差数据分别建模,两者的差异产生疾病风险参数。我们比较了成对检测和集体推理方法,后者基于DDA和逻辑回归。使用模拟数据进行的分析表明,与成对检测相比,集体推理以及与逻辑回归相比,DDA能够实现显著更高的敏感性和特异性。使用年龄相关性黄斑变性(AMD)数据,我们展示了DDA的两种可能应用。在第一个应用中,通过筛选将全基因组SNP集缩减为少量(约100个)变体,并识别出具有显著相互作用的SNP对。我们发现,与AMD关联度最高的SNP之间的相互作用在肝脏、脂肪细胞和间充质干细胞中具有表观遗传活性。在另一个应用中,从全基因组数据中形成多组SNP,并使用交叉验证比较它们的相对关联强度。该分析使我们能够发现新的基因座集合,其中SNP之间的相互作用在其疾病关联中起重要作用。特别是,我们考虑了基于通路的SNP组,每组包含多达约10,000个变体。除了与补体激活相关的通路外,我们的集体推理还指向了参与磷脂合成、氧化应激和细胞凋亡的通路组,这与视网膜色素上皮细胞功能障碍起核心作用的AMD发病机制一致。
在一组SNP内同时推断集体相互作用效应有可能揭示疾病关联的新方面。