Luo Yongyi, Zhang Zhen, Wang Shu, Shi Jiandong, Hao Jingyu, Lian Sheng, Hu Taobo, Ishibashi Toyotaka, Wang Depeng, Yu Weichuan, Fan Xiaodan
Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong 999077, China.
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong 999077, China.
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf095.
Genomic variations, including single-nucleotide polymorphisms, small insertions and deletions, and structural variations, are crucial for understanding evolution and disease. However, comprehensive simulation tools for benchmarking genomic analysis methods are lacking. Existing simulators do not accurately represent the nonuniform distribution and length patterns of structural variations in human genomes, and simulating complex structural variations remains challenging.
We present BVSim, a flexible tool that provides probabilistic simulations of genomic variations, primarily focusing on human patterns while accommodating diverse species. BVSim effectively simulates both simple and complex structural variations and small variants by mimicking real-life variation distributions, which often exhibit higher frequencies near telomeres and within tandem repeat regions. Notably, BVSim allows users to input single or multiple benchmark samples from any reference genome, enabling the tool to summarize and represent the unique distribution patterns of structural variation positions and lengths specific to those species. Its compatibility with standard file formats facilitates seamless integration into various genomic research workflows, making it a very useful resource for benchmarking downstream tools such as variant callers. With numerical experiments, we show that BVSim generated more realistic sequences significantly different from other simulators' outputs.
BVSim is written in Python and freely available to noncommercial users under the GPL3 license. Source code, application guide, and toy examples are provided on the GitHub page at https://github.com/YongyiLuo98/BVSim. The tool is registered in SciCrunch (RRID:SCR_026926), bio.tools (biotools:BVSim), and WorkflowHub (doi:10.48546/WORKFLOWHUB.WORKFLOW.1361.1).
基因组变异,包括单核苷酸多态性、小插入和缺失以及结构变异,对于理解进化和疾病至关重要。然而,缺乏用于对基因组分析方法进行基准测试的综合模拟工具。现有的模拟器不能准确反映人类基因组中结构变异的非均匀分布和长度模式,并且模拟复杂的结构变异仍然具有挑战性。
我们展示了BVSim,这是一种灵活的工具,可提供基因组变异的概率模拟,主要关注人类模式,同时也适用于多种物种。BVSim通过模拟现实生活中的变异分布有效地模拟简单和复杂的结构变异以及小变异,这些分布通常在端粒附近和串联重复区域内具有更高的频率。值得注意的是,BVSim允许用户从任何参考基因组输入单个或多个基准样本,使该工具能够总结和呈现特定于这些物种的结构变异位置和长度的独特分布模式。它与标准文件格式的兼容性便于无缝集成到各种基因组研究工作流程中,使其成为对下游工具(如变异调用程序)进行基准测试的非常有用的资源。通过数值实验,我们表明BVSim生成的序列比其他模拟器的输出更现实且差异显著。
BVSim是用Python编写的,根据GPL3许可向非商业用户免费提供。源代码、应用指南和示例在GitHub页面https://github.com/YongyiLuo98/BVSim上提供。该工具已在SciCrunch(RRID:SCR_026926)、bio.tools(biotools:BVSim)和WorkflowHub(doi:10.48546/WORKFLOWHUB.WORKFLOW.1361.1)中注册。