BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liège, 4000 Liège, Belgium, 11 Liège 4000, Belgium.
Institut Curie, PSL Research University, F-75005 Paris, France.
Gigascience. 2022 Feb 4;11. doi: 10.1093/gigascience/giab093.
Detecting epistatic interactions at the gene level is essential to understanding the biological mechanisms of complex diseases. Unfortunately, genome-wide interaction association studies involve many statistical challenges that make such detection hard. We propose a multi-step protocol for epistasis detection along the edges of a gene-gene co-function network. Such an approach reduces the number of tests performed and provides interpretable interactions while keeping type I error controlled. Yet, mapping gene interactions into testable single-nucleotide polymorphism (SNP)-interaction hypotheses, as well as computing gene pair association scores from SNP pair ones, is not trivial.
Here we compare 3 SNP-gene mappings (positional overlap, expression quantitative trait loci, and proximity in 3D structure) and use the adaptive truncated product method to compute gene pair scores. This method is non-parametric, does not require a known null distribution, and is fast to compute. We apply multiple variants of this protocol to a genome-wide association study dataset on inflammatory bowel disease. Different configurations produced different results, highlighting that various mechanisms are implicated in inflammatory bowel disease, while at the same time, results overlapped with known disease characteristics. Importantly, the proposed pipeline also differs from a conventional approach where no network is used, showing the potential for additional discoveries when prior biological knowledge is incorporated into epistasis detection.
在基因水平上检测上位性相互作用对于理解复杂疾病的生物学机制至关重要。不幸的是,全基因组相互作用关联研究涉及许多统计学挑战,使得这种检测变得困难。我们提出了一种沿着基因-基因共功能网络边缘进行上位性检测的多步骤协议。这种方法减少了执行的测试数量,并提供了可解释的相互作用,同时保持了第一类错误的控制。然而,将基因相互作用映射到可测试的单核苷酸多态性(SNP)-相互作用假设中,并从 SNP 对计算基因对关联分数,并不简单。
在这里,我们比较了 3 种 SNP-基因映射(位置重叠、表达数量性状基因座和 3D 结构中的接近度),并使用自适应截断乘积方法计算基因对得分。这种方法是非参数的,不需要已知的零分布,并且计算速度很快。我们将这个协议的多个变体应用于炎症性肠病的全基因组关联研究数据集。不同的配置产生了不同的结果,突出了各种机制与炎症性肠病有关,同时结果与已知的疾病特征重叠。重要的是,所提出的管道也与不使用网络的传统方法不同,表明在将先验生物学知识纳入上位性检测时,可能会有更多的发现。