Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA.
Current address: National Institute on Aging, National Institutes of Health, Baltimore, MD, USA.
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae164.
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
RNA-seq 模拟在生物信息学工具的评估、比较、基准测试和开发中至关重要。然而,在过去的十年中,RNA-seq 模拟器领域的进展甚微。为了解决这一需求,我们开发了 BEERS2,它将灵活且高度可配置的设计与整个文库制备和测序流程的详细模拟相结合。BEERS2 从可定制的输入或来自 CAMPAREE 模拟的 RNA 样本中获取输入转录本(通常是带有 polyA 尾巴的全长信使 RNA 转录本)。它以 FASTQ、SAM 或 BAM 格式生成这些转录本的真实读取,其中 SAM 或 BAM 格式包含与参考基因组的真实对齐。它还生成真实的转录本水平定量值。BEERS2 将灵活且高度可配置的设计与整个文库制备和测序流程的详细模拟相结合,旨在包括 polyA 选择和 RiboZero 对核糖体耗竭的影响、六聚体引发序列偏差、聚合酶链反应 (PCR) 扩增中的 GC 含量偏差、条形码读取错误和 PCR 扩增过程中的错误。这些特征结合起来,使 BEERS2 成为迄今为止最完整的 RNA-seq 模拟。最后,我们通过测量几个设置对流行的 Salmon 伪比对算法的影响来展示 BEERS2 的用途。