Freda Philip J, Ye Suyu, Zhang Robert, Moore Jason H, Urbanowicz Ryan J
Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, 90069, CA, USA.
Whiting School of Engineering, Johns Hopkins University, 3400 N. Charles St., Baltimore, 21218, MD, USA.
BioData Min. 2024 Oct 1;17(1):37. doi: 10.1186/s13040-024-00390-0.
Epistasis, the interaction between genetic loci where the effect of one locus is influenced by one or more other loci, plays a crucial role in the genetic architecture of complex traits. However, as the number of loci considered increases, the investigation of epistasis becomes exponentially more complex, making the selection of key features vital for effective downstream analyses. Relief-Based Algorithms (RBAs) are often employed for this purpose due to their reputation as "interaction-sensitive" algorithms and uniquely non-exhaustive approach. However, the limitations of RBAs in detecting interactions, particularly those involving multiple loci, have not been thoroughly defined. This study seeks to address this gap by evaluating the efficiency of RBAs in detecting higher-order epistatic interactions. Motivated by previous findings that suggest some RBAs may rank predictive features involved in higher-order epistasis negatively, we explore the potential of absolute value ranking of RBA feature weights as an alternative approach for capturing complex interactions. In this study, we assess the performance of ReliefF, MultiSURF, and MultiSURFstar on simulated genetic datasets that model various patterns of genotype-phenotype associations, including 2-way to 5-way genetic interactions, and compare their performance to two control methods: a random shuffle and mutual information.
Our findings indicate that while RBAs effectively identify lower-order (2 to 3-way) interactions, their capability to detect higher-order interactions is significantly limited, primarily by large feature count but also by signal noise. Specifically, we observe that RBAs are successful in detecting fully penetrant 4-way XOR interactions using an absolute value ranking approach, but this is restricted to datasets with only 20 total features.
These results highlight the inherent limitations of current RBAs and underscore the need for the development of Relief-based approaches with enhanced detection capabilities for the investigation of epistasis, particularly in datasets with large feature counts and complex higher-order interactions.
上位性是指基因座之间的相互作用,其中一个基因座的效应受到一个或多个其他基因座的影响,它在复杂性状的遗传结构中起着至关重要的作用。然而,随着所考虑基因座数量的增加,上位性的研究变得呈指数级更加复杂,这使得选择关键特征对于有效的下游分析至关重要。基于 Relief 的算法(RBAs)由于其作为“交互敏感”算法的声誉和独特的非穷举方法,常被用于此目的。然而,RBAs 在检测相互作用,特别是涉及多个基因座的相互作用方面的局限性尚未得到充分界定。本研究旨在通过评估 RBAs 在检测高阶上位性相互作用方面的效率来解决这一差距。受先前研究结果的启发,即一些 RBAs 可能会对涉及高阶上位性的预测特征进行负排名,我们探索了 RBA 特征权重绝对值排名作为捕获复杂相互作用的替代方法的潜力。在本研究中,我们评估了 ReliefF、MultiSURF 和 MultiSURFstar 在模拟遗传数据集上的性能,这些数据集模拟了各种基因型 - 表型关联模式,包括 2 阶到 5 阶遗传相互作用,并将它们的性能与两种对照方法进行比较:随机洗牌和互信息。
我们的研究结果表明,虽然 RBAs 能够有效地识别低阶(2 到 3 阶)相互作用,但其检测高阶相互作用的能力受到显著限制,主要是由于特征数量众多,也受到信号噪声的影响。具体而言,我们观察到 RBAs 使用绝对值排名方法成功检测到完全显性的 4 阶异或相互作用,但这仅限于总特征数仅为 20 的数据集。
这些结果突出了当前 RBAs 的固有局限性,并强调需要开发具有增强检测能力的基于 Relief 的方法来研究上位性,特别是在具有大量特征和复杂高阶相互作用的数据集中。