Department of Statistics, Rice University, Houston, Texas, USA.
Division of Cancer Control and Population Sciences, Statistical Research and Applications Branch, Surveillance Research Program, National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, USA.
Genet Epidemiol. 2021 Mar;45(2):131-141. doi: 10.1002/gepi.22362. Epub 2020 Oct 16.
In silico simulations play an indispensable role in the development and application of statistical models and methods for genetic studies. Simulation tools allow for the evaluation of methods and investigation of models in a controlled manner. With the growing popularity of evolutionary models and simulation-based statistical methods, genetic simulations have been applied to a wide variety of research disciplines such as population genetics, evolutionary genetics, genetic epidemiology, ecology, and conservation biology. In this review, we surveyed 1409 articles from five journals that publish on major application areas of genetic simulations. We identified 432 papers in which genetic simulations were used and examined the targets and applications of simulation studies and how these simulation methods and simulated data sets are reported and shared. Whereas a large proportion (30%) of the surveyed articles reported the use of genetic simulations, only 28% of these genetic simulation studies used existing simulation software, 2% used existing simulated data sets, and 19% and 12% made source code and simulated data sets publicly available, respectively. Moreover, 15% of articles provided no information on how simulation studies were performed. These findings suggest a need to encourage sharing and reuse of existing simulation software and data sets, as well as providing more information regarding the performance of simulations.
计算机模拟在遗传研究的统计模型和方法的开发和应用中发挥着不可或缺的作用。模拟工具允许以受控的方式评估方法和研究模型。随着进化模型和基于模拟的统计方法的日益普及,遗传模拟已应用于许多研究领域,如群体遗传学、进化遗传学、遗传流行病学、生态学和保护生物学。在这篇综述中,我们调查了五个发表遗传模拟主要应用领域文章的期刊中的 1409 篇文章。我们确定了 432 篇使用遗传模拟的论文,并研究了模拟研究的目标和应用,以及这些模拟方法和模拟数据集是如何报告和共享的。虽然调查文章中有很大一部分(30%)报告了遗传模拟的使用,但这些遗传模拟研究中只有 28%使用了现有的模拟软件,2%使用了现有的模拟数据集,分别有 19%和 12%的研究公开了源代码和模拟数据集,此外,15%的文章没有提供关于如何进行模拟研究的信息。这些发现表明需要鼓励共享和重用现有的模拟软件和数据集,并提供更多关于模拟性能的信息。