School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, China.
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, China.
Interdiscip Sci. 2024 Sep;16(3):688-711. doi: 10.1007/s12539-024-00621-2. Epub 2024 Jul 2.
To elucidate the genetic basis of complex diseases, it is crucial to discover the single-nucleotide polymorphisms (SNPs) contributing to disease susceptibility. This is particularly challenging for high-order SNP epistatic interactions (HEIs), which exhibit small individual effects but potentially large joint effects. These interactions are difficult to detect due to the vast search space, encompassing billions of possible combinations, and the computational complexity of evaluating them. This study proposes a novel explicit-encoding-based multitasking harmony search algorithm (MTHS-EE-DHEI) specifically designed to address this challenge. The algorithm operates in three stages. First, a harmony search algorithm is employed, utilizing four lightweight evaluation functions, such as Bayesian network and entropy, to efficiently explore potential SNP combinations related to disease status. Second, a G-test statistical method is applied to filter out insignificant SNP combinations. Finally, two machine learning-based methods, multifactor dimensionality reduction (MDR) as well as random forest (RF), are employed to validate the classification performance of the remaining significant SNP combinations. This research aims to demonstrate the effectiveness of MTHS-EE-DHEI in identifying HEIs compared to existing methods, potentially providing valuable insights into the genetic architecture of complex diseases. The performance of MTHS-EE-DHEI was evaluated on twenty simulated disease datasets and three real-world datasets encompassing age-related macular degeneration (AMD), rheumatoid arthritis (RA), and breast cancer (BC). The results demonstrably indicate that MTHS-EE-DHEI outperforms four state-of-the-art algorithms in terms of both detection power and computational efficiency. The source code is available at https://github.com/shouhengtuo/MTHS-EE-DHEI.git .
为了阐明复杂疾病的遗传基础,发现导致疾病易感性的单核苷酸多态性(SNP)至关重要。这对于高阶 SNP 上位性相互作用(HEI)来说尤其具有挑战性,因为它们具有较小的个体效应,但可能具有较大的联合效应。这些相互作用由于搜索空间巨大,包含数十亿种可能的组合,并且评估它们的计算复杂性,因此很难检测到。本研究提出了一种新的基于显式编码的多任务和声搜索算法(MTHS-EE-DHEI),专门用于解决这一挑战。该算法分三个阶段运行。首先,使用和声搜索算法,利用四个轻量级评估函数(如贝叶斯网络和熵)来有效地探索与疾病状态相关的潜在 SNP 组合。其次,应用 G 检验统计方法来筛选出不重要的 SNP 组合。最后,采用两种基于机器学习的方法,多因素维度减少(MDR)和随机森林(RF),来验证剩余显著 SNP 组合的分类性能。本研究旨在证明 MTHS-EE-DHEI 与现有方法相比在识别 HEI 方面的有效性,为复杂疾病的遗传结构提供有价值的见解。在二十个模拟疾病数据集和三个真实世界数据集(包括年龄相关性黄斑变性(AMD)、类风湿关节炎(RA)和乳腺癌(BC))上评估了 MTHS-EE-DHEI 的性能。结果明显表明,MTHS-EE-DHEI 在检测能力和计算效率方面均优于四种最先进的算法。源代码可在 https://github.com/shouhengtuo/MTHS-EE-DHEI.git 获得。