Van Hecke Marie, Marchal Kathleen
IDLab, Department of Information Technology, Ghent University-imec, 9052 Ghent, Belgium.
Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium.
Bioinform Adv. 2025 Jul 30;5(1):vbaf183. doi: 10.1093/bioadv/vbaf183. eCollection 2025.
Smart-seq3 is a powerful full-length single-cell RNA sequencing protocol that enables transcript-level quantification and splicing analysis by preserving unique molecular identifier (UMI) information. However, benchmarking computational tools for isoform reconstruction and splicing quantification remains challenging due to the lack of ground truth datasets. Herein, we present smartSim, a Smart-seq3 read simulator designed to generate realistic sequencing data that accurately reflects the complexities of single-cell transcriptomics.
smartSim simulates known and novel splicing events, generates both UMI-containing and internal reads, and mimics protocol-specific biases by leveraging empirical data distributions. Our results show that smartSim-generated data closely resembles real Smart-seq3 datasets in terms of fragment length distributions, internal read counts, and read quality scores. It generates raw sequencing reads in FASTQ format, making it compatible with both genome- and transcriptome-based alignment tools. By extending simulation beyond gene-level quantification, smartSim provides a crucial resource for evaluating and improving computational methods for alternative splicing detection and isoform reconstruction in single-cell RNA sequencing.
smartSim is available at https://github.com/MarchalLab/smartSim.
Smart-seq3是一种强大的全长单细胞RNA测序方案,通过保留独特分子标识符(UMI)信息实现转录本水平定量和剪接分析。然而,由于缺乏真实数据集,对异构体重建和剪接定量的计算工具进行基准测试仍然具有挑战性。在此,我们展示了smartSim,这是一种Smart-seq3读取模拟器,旨在生成能够准确反映单细胞转录组学复杂性的逼真测序数据。
smartSim模拟已知和新的剪接事件,生成包含UMI的读取和内部读取,并通过利用经验数据分布模拟特定方案的偏差。我们的结果表明,smartSim生成的数据在片段长度分布、内部读取计数和读取质量分数方面与真实的Smart-seq3数据集非常相似。它以FASTQ格式生成原始测序读取,使其与基于基因组和转录组的比对工具兼容。通过将模拟扩展到基因水平定量之外,smartSim为评估和改进单细胞RNA测序中可变剪接检测和异构体重建的计算方法提供了关键资源。