Shang Junliang, Xu Anqi, Bi Mingyuan, Zhang Yuanyuan, Li Feng, Liu Jin-Xing
School of Computer Science, Qufu Normal University, Rizhao 276826, China.
School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266033, China.
Brief Funct Genomics. 2024 Dec 6;23(6):745-753. doi: 10.1093/bfgp/elae034.
Genome-wide association study (GWAS) is essential for investigating the genetic basis of complex diseases; nevertheless, it usually ignores the interaction of multiple single nucleotide polymorphisms (SNPs). Genome-wide interaction studies provide crucial means for exploring complex genetic interactions that GWAS may miss. Although many interaction methods have been proposed, challenges still persist, including the lack of epistasis models and the inconsistency of benchmark datasets. SNP data simulation is a pivotal intermediary between interaction methods and real applications. Therefore, it is important to obtain epistasis models and benchmark datasets by simulation tools, which is helpful for further improving interaction methods. At present, many simulation tools have been widely employed in the field of population genetics. According to their basic principles, these existing tools can be divided into four categories: coalescent simulation, forward-time simulation, resampling simulation, and other simulation frameworks. In this paper, their basic principles and representative simulation tools are compared and analyzed in detail. Additionally, this paper provides a discussion and summary of the advantages and disadvantages of these frameworks and tools, offering technical insights for the design of new methods, and serving as valuable reference tools for researchers to comprehensively understand GWAS and genome-wide interaction studies.
全基因组关联研究(GWAS)对于研究复杂疾病的遗传基础至关重要;然而,它通常忽略了多个单核苷酸多态性(SNP)之间的相互作用。全基因组相互作用研究为探索GWAS可能遗漏的复杂遗传相互作用提供了关键手段。尽管已经提出了许多相互作用方法,但挑战仍然存在,包括上位性模型的缺乏和基准数据集的不一致性。SNP数据模拟是相互作用方法与实际应用之间的关键中介。因此,通过模拟工具获得上位性模型和基准数据集很重要,这有助于进一步改进相互作用方法。目前,许多模拟工具已在群体遗传学领域广泛应用。根据其基本原理,这些现有工具可分为四类:合并模拟、前向时间模拟、重采样模拟和其他模拟框架。本文对它们的基本原理和代表性模拟工具进行了详细的比较和分析。此外,本文对这些框架和工具的优缺点进行了讨论和总结,为新方法的设计提供技术见解,并为研究人员全面理解GWAS和全基因组相互作用研究提供有价值的参考工具。