Suppr超能文献

smartSim:剪接感知单细胞smart-seq3数据的模拟

smartSim: simulation of splice aware single cell smart-seq3 data.

作者信息

Van Hecke Marie, Marchal Kathleen

机构信息

IDLab, Department of Information Technology, Ghent University-imec, 9052 Ghent, Belgium.

Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium.

出版信息

Bioinform Adv. 2025 Jul 30;5(1):vbaf183. doi: 10.1093/bioadv/vbaf183. eCollection 2025.

Abstract

MOTIVATION

Smart-seq3 is a powerful full-length single-cell RNA sequencing protocol that enables transcript-level quantification and splicing analysis by preserving unique molecular identifier (UMI) information. However, benchmarking computational tools for isoform reconstruction and splicing quantification remains challenging due to the lack of ground truth datasets. Herein, we present smartSim, a Smart-seq3 read simulator designed to generate realistic sequencing data that accurately reflects the complexities of single-cell transcriptomics.

RESULTS

smartSim simulates known and novel splicing events, generates both UMI-containing and internal reads, and mimics protocol-specific biases by leveraging empirical data distributions. Our results show that smartSim-generated data closely resembles real Smart-seq3 datasets in terms of fragment length distributions, internal read counts, and read quality scores. It generates raw sequencing reads in FASTQ format, making it compatible with both genome- and transcriptome-based alignment tools. By extending simulation beyond gene-level quantification, smartSim provides a crucial resource for evaluating and improving computational methods for alternative splicing detection and isoform reconstruction in single-cell RNA sequencing.

AVAILABILITY AND IMPLEMENTATION

smartSim is available at https://github.com/MarchalLab/smartSim.

摘要

动机

Smart-seq3是一种强大的全长单细胞RNA测序方案,通过保留独特分子标识符(UMI)信息实现转录本水平定量和剪接分析。然而,由于缺乏真实数据集,对异构体重建和剪接定量的计算工具进行基准测试仍然具有挑战性。在此,我们展示了smartSim,这是一种Smart-seq3读取模拟器,旨在生成能够准确反映单细胞转录组学复杂性的逼真测序数据。

结果

smartSim模拟已知和新的剪接事件,生成包含UMI的读取和内部读取,并通过利用经验数据分布模拟特定方案的偏差。我们的结果表明,smartSim生成的数据在片段长度分布、内部读取计数和读取质量分数方面与真实的Smart-seq3数据集非常相似。它以FASTQ格式生成原始测序读取,使其与基于基因组和转录组的比对工具兼容。通过将模拟扩展到基因水平定量之外,smartSim为评估和改进单细胞RNA测序中可变剪接检测和异构体重建的计算方法提供了关键资源。

可用性和实现

smartSim可在https://github.com/MarchalLab/smartSim上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc2a/12373632/685bc7b2fd3a/vbaf183f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验