Department of Integrative Biology, University of Wisconsin, Madison, WI, USA.
Mol Ecol Resour. 2020 Jul;20(4):1132-1140. doi: 10.1111/1755-0998.13173. Epub 2020 May 20.
High-throughput sequencing (HTS) is central to the study of population genomics and has an increasingly important role in constructing phylogenies. Choices in research design for sequencing projects can include a wide range of factors, such as sequencing platform, depth of coverage and bioinformatic tools. Simulating HTS data better informs these decisions, as users can validate software by comparing output to the known simulation parameters. However, current standalone HTS simulators cannot generate variant haplotypes under even somewhat complex evolutionary scenarios, such as recombination or demographic change. This greatly reduces their usefulness for fields such as population genomics and phylogenomics. Here I present the R package jackalope that simply and efficiently simulates (i) sets of variant haplotypes from a reference genome and (ii) reads from both Illumina and Pacific Biosciences platforms. Haplotypes can be simulated using phylogenies, gene trees, coalescent-simulation output, population-genomic summary statistics, and Variant Call Format (VCF) files. jackalope can simulate single, paired-end or mate-pair Illumina reads, as well as reads from Pacific Biosciences. These simulations include sequencing errors, mapping qualities, multiplexing and optical/PCR duplicates. It can read reference genomes from fasta files and can simulate new ones, and all outputs can be written to standard file formats. jackalope is available for Mac, Windows and Linux systems.
高通量测序 (HTS) 是群体基因组学研究的核心,并且在构建系统发育树方面发挥着越来越重要的作用。测序项目的研究设计选择可以包括许多因素,例如测序平台、覆盖深度和生物信息学工具。通过模拟 HTS 数据,可以更好地做出这些决策,因为用户可以通过将输出与已知的模拟参数进行比较来验证软件。然而,当前的独立 HTS 模拟器甚至在一些复杂的进化场景下,如重组或种群变化,都无法生成变异单倍型。这大大降低了它们在群体基因组学和系统发生基因组学等领域的用途。在这里,我介绍了 R 包 jackalope,它可以简单有效地模拟 (i) 来自参考基因组的一组变异单倍型,以及 (ii) 来自 Illumina 和 Pacific Biosciences 平台的读取数据。可以使用系统发育树、基因树、合并模拟输出、群体基因组学汇总统计数据和 Variant Call Format (VCF) 文件来模拟单倍型。jackalope 可以模拟单端、配对末端或 mate-pair Illumina 读取,以及 Pacific Biosciences 的读取。这些模拟包括测序错误、映射质量、多重和光学/PCR 重复。它可以从 fasta 文件读取参考基因组并可以模拟新的基因组,并且所有输出都可以写入标准文件格式。jackalope 可用于 Mac、Windows 和 Linux 系统。