Suppr超能文献

Epi-SSA:一种基于多目标麻雀搜索算法的新的上位性检测方法。

Epi-SSA: A novel epistasis detection method based on a multi-objective sparrow search algorithm.

机构信息

College of Computer Science and Technology, Changchun University, Changchun City, Jilin Province, China.

School of Cultural and Media Studies, Changchun University of Science and Technology, Changchun City, Jilin Province, China.

出版信息

PLoS One. 2024 Oct 24;19(10):e0311223. doi: 10.1371/journal.pone.0311223. eCollection 2024.

Abstract

Genome-wide association studies typically considers epistatic interactions as a crucial factor in exploring complex diseases. However, the current methods primarily concentrate on the detection of two-order epistatic interactions, with flaws in accuracy. In this work, we introduce a novel method called Epi-SSA, which can be better utilized to detect high-order epistatic interactions. Epi-SSA draws inspiration from the sparrow search algorithm and optimizes the population based on multiple objective functions in each iteration, in order to be able to more precisely identify epistatic interactions. To evaluate its performance, we conducted a comprehensive comparison between Epi-SSA and seven other methods using five simulation datasets: DME 100, DNME 100, DME 1000, DNME 1000 and DNME3 100. The DME 100 dataset encompasses eight second-order epistasis disease models with marginal effects, each comprising 100 simulated data instances, featuring 100 SNPs per instance, alongside 800 case and 800 control samples. The DNME 100 encompasses eight second-order epistasis disease models without marginal effects and retains other properties consistent with DME 100. Experiments on the DME 100 and DNME 100 datasets were designed to evaluate the algorithms' capacity to detect epistasis across varying disease models. The DME 1000 and DNME 1000 datasets extend the complexity with 1000 SNPs per simulated data instance, while retaining other properties consistent with DME 100 and DNME 100. These experiments aimed to gauge the algorithms' adaptability in detecting epistasis as the number of SNPs in the data increases. The DNME3 100 dataset introduces a higher level of complexity with six third-order epistasis disease models, otherwise paralleling the structure of DNME 100, serving to test the algorithms' proficiency in identifying higher-order epistasis. The highest average F-measures achieved by the seven other existing methods on the five datasets are 0.86, 0.86, 0.41, 0.56, and 0.79 respectively, while the average F-measures of Epi-SSA on the five datasets are 0.92, 0.97, 0.79, 0.86, and 0.97 respectively. The experimental results demonstrate that the Epi-SSA algorithm outperforms other methods in a variety of epistasis detection tasks. As the number of SNPs in the data set increases and the order of epistasis rises, the advantages of the Epi-SSA algorithm become increasingly pronounced. In addition, we applied Epi-SSA to the analysis of the WTCCC dataset, uncovering numerous genes and gene pairs that might play a significant role in the pathogenesis of seven complex diseases. It is worthy of note that some of these genes have been relatedly reported in the Comparative Toxicogenomics Database (CTD). Epi-SSA is a potent tool for detecting epistatic interactions, which aids us in further comprehending the pathogenesis of common and complex diseases. The source code of Epi-SSA can be obtained at https://osf.io/6sqwj/.

摘要

全基因组关联研究通常将上位性相互作用视为探索复杂疾病的关键因素。然而,目前的方法主要集中于检测二阶上位性相互作用,存在准确性方面的缺陷。在这项工作中,我们引入了一种名为 Epi-SSA 的新方法,该方法可以更好地用于检测高阶上位性相互作用。Epi-SSA 受到麻雀搜索算法的启发,在每次迭代中基于多个目标函数来优化种群,以便更准确地识别上位性相互作用。为了评估其性能,我们使用五个模拟数据集:DME 100、DNME 100、DME 1000、DNME 1000 和 DNME3 100,对 Epi-SSA 与其他七种方法进行了全面比较。DME 100 数据集包含八个具有边际效应的二阶上位性疾病模型,每个模型都由 100 个模拟数据实例组成,每个实例包含 100 个 SNPs,以及 800 个病例和 800 个对照样本。DNME 100 包含八个没有边际效应的二阶上位性疾病模型,保留了与 DME 100 相同的其他属性。DME 100 和 DNME 100 数据集的实验旨在评估算法在不同疾病模型中检测上位性的能力。DME 1000 和 DNME 1000 数据集通过每个模拟数据实例的 1000 个 SNPs 扩展了复杂性,同时保留了与 DME 100 和 DNME 100 相同的其他属性。这些实验旨在衡量算法在数据中 SNP 数量增加时检测上位性的适应性。DNME3 100 数据集引入了六个具有三阶上位性疾病模型的更高复杂性,其他方面与 DNME 100 结构平行,用于测试算法识别高阶上位性的能力。在五个数据集上,其他七种现有方法的最高平均 F 度量分别为 0.86、0.86、0.41、0.56 和 0.79,而 Epi-SSA 在五个数据集上的平均 F 度量分别为 0.92、0.97、0.79、0.86 和 0.97。实验结果表明,Epi-SSA 算法在各种上位性检测任务中均优于其他方法。随着数据集 SNP 数量的增加和上位性阶数的升高,Epi-SSA 算法的优势变得更加明显。此外,我们将 Epi-SSA 应用于 WTCCC 数据集的分析,发现了许多可能在七种复杂疾病发病机制中起重要作用的基因和基因对。值得注意的是,其中一些基因在比较毒理学基因组数据库 (CTD) 中有相关报道。Epi-SSA 是一种强大的检测上位性相互作用的工具,有助于我们进一步理解常见和复杂疾病的发病机制。Epi-SSA 的源代码可以在 https://osf.io/6sqwj/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e118/11500897/f14e06bc0795/pone.0311223.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验