Batista Sandra, Madar Vered Senderovich, Freda Philip J, Bhandary Priyanka, Ghosh Attri, Matsumoto Nicholas, Chitre Apurva S, Palmer Abraham A, Moore Jason H
Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA.
, Chapel Hill, NC, USA.
BioData Min. 2024 Feb 28;17(1):7. doi: 10.1186/s13040-024-00358-0.
Epistasis, the interaction between two or more genes, is integral to the study of genetics and is present throughout nature. Yet, it is seldom fully explored as most approaches primarily focus on single-locus effects, partly because analyzing all pairwise and higher-order interactions requires significant computational resources. Furthermore, existing methods for epistasis detection only consider a Cartesian (multiplicative) model for interaction terms. This is likely limiting as epistatic interactions can evolve to produce varied relationships between genetic loci, some complex and not linearly separable.
We present new algorithms for the interaction coefficients for standard regression models for epistasis that permit many varied models for the interaction terms for loci and efficient memory usage. The algorithms are given for two-way and three-way epistasis and may be generalized to higher order epistasis. Statistical tests for the interaction coefficients are also provided. We also present an efficient matrix based algorithm for permutation testing for two-way epistasis. We offer a proof and experimental evidence that methods that look for epistasis only at loci that have main effects may not be justified. Given the computational efficiency of the algorithm, we applied the method to a rat data set and mouse data set, with at least 10,000 loci and 1,000 samples each, using the standard Cartesian model and the XOR model to explore body mass index.
This study reveals that although many of the loci found to exhibit significant statistical epistasis overlap between models in rats, the pairs are mostly distinct. Further, the XOR model found greater evidence for statistical epistasis in many more pairs of loci in both data sets with almost all significant epistasis in mice identified using XOR. In the rat data set, loci involved in epistasis under the XOR model are enriched for biologically relevant pathways.
Our results in both species show that many biologically relevant epistatic relationships would have been undetected if only one interaction model was applied, providing evidence that varied interaction models should be implemented to explore epistatic interactions that occur in living systems.
上位性,即两个或多个基因之间的相互作用,是遗传学研究不可或缺的一部分,且在自然界中普遍存在。然而,由于大多数方法主要关注单基因座效应,上位性很少得到充分研究,部分原因是分析所有成对和高阶相互作用需要大量计算资源。此外,现有的上位性检测方法仅考虑相互作用项的笛卡尔(乘法)模型。这可能具有局限性,因为上位性相互作用可以进化以在基因座之间产生各种关系,其中一些关系复杂且非线性可分。
我们提出了用于上位性标准回归模型相互作用系数的新算法,该算法允许基因座相互作用项有多种不同模型,并能高效利用内存。给出了双向和三向上位性的算法,且可推广到高阶上位性。还提供了相互作用系数的统计检验。我们还提出了一种基于矩阵的高效算法用于双向上位性的置换检验。我们提供了一个证明和实验证据,表明仅在具有主效应的基因座处寻找上位性的方法可能不合理。鉴于算法的计算效率,我们将该方法应用于大鼠数据集和小鼠数据集,每个数据集至少有10000个基因座和1000个样本,使用标准笛卡尔模型和异或模型来探究体重指数。
本研究表明,尽管在大鼠模型中发现表现出显著统计上位性的许多基因座存在重叠,但这些对大多是不同的。此外,异或模型在两个数据集中更多的基因座对中发现了统计上位性的更多证据,在小鼠中几乎所有显著的上位性都是使用异或模型鉴定出来的。在大鼠数据集中,异或模型下涉及上位性的基因座在生物学相关途径中富集。
我们在两个物种中的结果表明,如果仅应用一种相互作用模型,许多生物学相关的上位性关系将无法被检测到,这证明应采用多种相互作用模型来探索生命系统中发生的上位性相互作用。