Computing Science Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
Department of Computer Science, the University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae051.
Transcriptomic long-read (LR) sequencing is an increasingly cost-effective technology for probing various RNA features. Numerous tools have been developed to tackle various transcriptomic sequencing tasks (e.g. isoform and gene fusion detection). However, the lack of abundant gold-standard datasets hinders the benchmarking of such tools. Therefore, the simulation of LR sequencing is an important and practical alternative. While the existing LR simulators aim to imitate the sequencing machine noise and to target specific library protocols, they lack some important library preparation steps (e.g. PCR) and are difficult to modify to new and changing library preparation techniques (e.g. single-cell LRs).
We present TKSM, a modular and scalable LR simulator, designed so that each RNA modification step is targeted explicitly by a specific module. This allows the user to assemble a simulation pipeline as a combination of TKSM modules to emulate a specific sequencing design. Additionally, the input/output of all the core modules of TKSM follows the same simple format (Molecule Description Format) allowing the user to easily extend TKSM with new modules targeting new library preparation steps.
TKSM is available as an open source software at https://github.com/vpc-ccg/tksm.
转录组长读(LR)测序是一种越来越具成本效益的技术,可用于探测各种 RNA 特征。已经开发了许多工具来解决各种转录组测序任务(例如,异构体和基因融合检测)。然而,缺乏丰富的黄金标准数据集阻碍了这些工具的基准测试。因此,LR 测序的模拟是一种重要且实用的替代方法。虽然现有的 LR 模拟器旨在模仿测序机器的噪声并针对特定的文库协议,但它们缺乏一些重要的文库制备步骤(例如 PCR),并且难以针对新的和不断变化的文库制备技术(例如单细胞 LR)进行修改。
我们提出了 TKSM,这是一种模块化和可扩展的 LR 模拟器,其设计方式使每个 RNA 修饰步骤都由特定的模块明确针对。这允许用户通过 TKSM 模块的组合组装模拟管道,以模拟特定的测序设计。此外,TKSM 的所有核心模块的输入/输出都遵循相同的简单格式(分子描述格式),允许用户使用针对新文库制备步骤的新模块轻松扩展 TKSM。
TKSM 可作为开源软件在 https://github.com/vpc-ccg/tksm 上获得。