SimLoRD：长读长数据模拟

SimLoRD: Simulation of Long Read Data.

作者信息

Stöcker Bianca K, Köster Johannes, Rahmann Sven

机构信息

Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, Essen, 45147, Germany.

Life Sciences, Centrum Wiskunde & Informatica (CWI), Amsterdam 1098 XG, The Netherlands Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA.

出版信息

Bioinformatics. 2016 Sep 1;32(17):2704-6. doi: 10.1093/bioinformatics/btw286. Epub 2016 May 10.

DOI:10.1093/bioinformatics/btw286

PMID:27166244

Abstract

MOTIVATION

Third generation sequencing methods provide longer reads than second generation methods and have distinct error characteristics. While there exist many read simulators for second generation data, there is a very limited choice for third generation data.

RESULTS

We analyzed public data from Pacific Biosciences (PacBio) SMRT sequencing, developed an error model and implemented it in a new read simulator called SimLoRD. It offers options to choose the read length distribution and to model error probabilities depending on the number of passes through the sequencer. The new error model makes SimLoRD the most realistic SMRT read simulator available.

AVAILABILITY AND IMPLEMENTATION

SimLoRD is available open source at http://bitbucket.org/genomeinformatics/simlord/ and installable via Bioconda (http://bioconda.github.io).

CONTACT

Bianca.Stoecker@uni-due.de or Sven.Rahmann@uni-due.de

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

第三代测序方法提供的读长比第二代方法更长，且具有独特的错误特征。虽然存在许多用于第二代数据的读段模拟器，但用于第三代数据的选择非常有限。

结果

我们分析了来自太平洋生物科学公司（PacBio）单分子实时（SMRT）测序的公共数据，开发了一个错误模型，并在一个名为SimLoRD的新读段模拟器中实现了该模型。它提供了选择读长分布以及根据通过测序仪的次数对错误概率进行建模的选项。新的错误模型使SimLoRD成为现有的最逼真的SMRT读段模拟器。

可用性和实现方式

SimLoRD可在http://bitbucket.org/genomeinformatics/simlord/上开源获取，并可通过Bioconda（http://bioconda.github.io）进行安装。

联系方式

Bianca.Stoecker@uni-due.de或Sven.Rahmann@uni-due.de

补充信息

补充数据可在《生物信息学》在线版获取。

相似文献

SimLoRD: Simulation of Long Read Data.SimLoRD：长读长数据模拟

Bioinformatics. 2016 Sep 1;32(17):2704-6. doi: 10.1093/bioinformatics/btw286. Epub 2016 May 10.

NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.NPBSS：一种新的 PacBio 测序模拟器，用于基于经验模型生成连续的长读长。

BMC Bioinformatics. 2018 May 22;19(1):177. doi: 10.1186/s12859-018-2208-0.

PaSS: a sequencing simulator for PacBio sequencing.PaSS：一种用于 PacBio 测序的测序模拟程序。

BMC Bioinformatics. 2019 Jun 21;20(1):352. doi: 10.1186/s12859-019-2901-7.

PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores.PBSIM2：一种带有新型质量评分生成模型的长读测序模拟软件。

Bioinformatics. 2021 May 5;37(5):589-595. doi: 10.1093/bioinformatics/btaa835.

Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。

BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.

Evaluation of tools for long read RNA-seq splice-aware alignment.长读 RNA-seq 剪接感知比对工具评估。

Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.

LRCstats, a tool for evaluating long reads correction methods.LRCstats，一种用于评估长读纠错方法的工具。

Bioinformatics. 2017 Nov 15;33(22):3652-3654. doi: 10.1093/bioinformatics/btx489.

SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data.SInC：一种准确且快速的基于错误模型的 SNP、Indel 和 CNV 模拟器，结合了用于短读序列数据的读取生成器。

BMC Bioinformatics. 2014 Feb 5;15:40. doi: 10.1186/1471-2105-15-40.

PBSIM: PacBio reads simulator--toward accurate genome assembly.PBSIM：PacBio reads 模拟器——实现更精确的基因组组装。

Bioinformatics. 2013 Jan 1;29(1):119-21. doi: 10.1093/bioinformatics/bts649. Epub 2012 Nov 4.

RepLong: de novo repeat identification using long read sequencing data.RepLong：利用长读测序数据进行从头重复识别。

Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717.

引用本文的文献

hafoe: an interactive tool for the analysis of chimeric AAV libraries after random mutagenesis.HAFOE：一种用于分析随机诱变后嵌合腺相关病毒文库的交互式工具。

Gene Ther. 2025 Jul 8. doi: 10.1038/s41434-025-00548-3.

A review of neural networks for metagenomic binning.宏基因组分箱的神经网络综述。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.

Haplotype-resolved assembly of diploid and polyploid genomes using quantum computing.利用量子计算进行二倍体和多倍体基因组的单倍型解析组装。

Cell Rep Methods. 2024 May 20;4(5):100754. doi: 10.1016/j.crmeth.2024.100754. Epub 2024 Apr 12.

MBE: model-based enrichment estimation and prediction for differential sequencing data.MBE：基于模型的差异测序数据的富集估计和预测。

Genome Biol. 2023 Oct 2;24(1):218. doi: 10.1186/s13059-023-03058-w.

Nanopore sequencing of PCR products enables multicopy gene family reconstruction.聚合酶链式反应（PCR）产物的纳米孔测序可实现多拷贝基因家族重建。

Comput Struct Biotechnol J. 2023 Jul 16;21:3656-3664. doi: 10.1016/j.csbj.2023.07.012. eCollection 2023.

TRcaller: a novel tool for precise and ultrafast tandem repeat variant genotyping in massively parallel sequencing reads.TRcaller：一种用于在大规模平行测序读数中进行精确和超快速串联重复变异基因分型的新型工具。

Front Genet. 2023 Jul 18;14:1227176. doi: 10.3389/fgene.2023.1227176. eCollection 2023.

HMMPolish: a coding region polishing tool for TGS-sequenced RNA viruses.HMMPolish：一种用于 TGS 测序 RNA 病毒的编码区修饰工具。

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad264.

SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph.SVJedi-graph：使用变异图提高长读长对紧密和重叠结构变异的基因分型。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i270-i278. doi: 10.1093/bioinformatics/btad237.

Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans.不同的结构变异预测工具在秀丽隐杆线虫中产生了相当不同的结果。

PLoS One. 2022 Dec 30;17(12):e0278424. doi: 10.1371/journal.pone.0278424. eCollection 2022.

PBSIM3: a simulator for all types of PacBio and ONT long reads.PBSIM3：一款适用于所有类型的PacBio和ONT长读长的模拟器。

NAR Genom Bioinform. 2022 Dec 1;4(4):lqac092. doi: 10.1093/nargab/lqac092. eCollection 2022 Dec.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SimLoRD：长读长数据模拟

SimLoRD: Simulation of Long Read Data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献