Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
BMC Bioinformatics. 2021 Mar 6;22(1):109. doi: 10.1186/s12859-021-04024-8.
Somatic single nucleotide variants have gained increased attention because of their role in cancer development and the widespread use of high-throughput sequencing techniques. The necessity to accurately identify these variants in sequencing data has led to a proliferation of somatic variant calling tools. Additionally, the use of simulated data to assess the performance of these tools has become common practice, as there is no gold standard dataset for benchmarking performance. However, many existing somatic variant simulation tools are limited because they rely on generating entirely synthetic reads derived from a reference genome or because they do not allow for the precise customizability that would enable a more focused understanding of single nucleotide variant calling performance.
SomatoSim is a tool that lets users simulate somatic single nucleotide variants in sequence alignment map (SAM/BAM) files with full control of the specific variant positions, number of variants, variant allele fractions, depth of coverage, read quality, and base quality, among other parameters. SomatoSim accomplishes this through a three-stage process: variant selection, where candidate positions are selected for simulation, variant simulation, where reads are selected and mutated, and variant evaluation, where SomatoSim summarizes the simulation results.
SomatoSim is a user-friendly tool that offers a high level of customizability for simulating somatic single nucleotide variants. SomatoSim is available at https://github.com/BieseckerLab/SomatoSim .
体细胞单核苷酸变异因其在癌症发展中的作用以及高通量测序技术的广泛应用而受到越来越多的关注。在测序数据中准确识别这些变体的必要性导致了大量体细胞变异调用工具的出现。此外,使用模拟数据来评估这些工具的性能已成为常见做法,因为没有用于基准性能的黄金标准数据集。然而,许多现有的体细胞变异模拟工具受到限制,因为它们依赖于生成完全来自参考基因组的合成读段,或者因为它们不允许进行精确的定制化,从而无法更深入地了解单核苷酸变异调用性能。
SomaticSim 是一种工具,允许用户使用序列比对图(SAM/BAM)文件中完全控制特定变异位置、变异数量、变异等位基因分数、覆盖深度、读取质量和碱基质量等参数模拟体细胞单核苷酸变异。SomaticSim 通过三个阶段的过程来实现这一点:变异选择,在此阶段选择要模拟的候选位置;变异模拟,在此阶段选择和突变读取;变异评估,在此阶段 SomatoSim 总结模拟结果。
SomaticSim 是一种用户友好的工具,提供了高度可定制的模拟体细胞单核苷酸变异的功能。SomaticSim 可在 https://github.com/BieseckerLab/SomatoSim 上获得。