Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA, 50011, USA.
Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, 50010, USA.
BMC Bioinformatics. 2020 Oct 1;21(1):429. doi: 10.1186/s12859-020-03751-8.
PacBio sequencing is an incredibly valuable third-generation DNA sequencing method due to very long read lengths, ability to detect methylated bases, and its real-time sequencing methodology. Yet, hitherto no tool was available for analyzing the quality of, subsampling, and filtering PacBio data.
Here we present SequelTools, a command-line program containing three tools: Quality Control, Read Subsampling, and Read Filtering. The Quality Control tool quickly processes PacBio Sequel raw sequence data from multiple SMRTcells producing multiple statistics and publication-quality plots describing the quality of the data including N50, read length and count statistics, PSR, and ZOR. The Read Subsampling tool allows the user to subsample reads by one or more of the following criteria: longest subreads per CLR or random CLR selection. The Read Filtering tool provides options for normalizing data by filtering out certain low-quality scraps reads and/or by minimum CLR length. SequelTools is implemented in bash, R, and Python using only standard libraries and packages and is platform independent.
SequelTools is a program that provides the only free, fast, and easy-to-use quality control tool, and the only program providing this kind of read subsampling and read filtering for PacBio Sequel raw sequence data, and is available at https://github.com/ISUgenomics/SequelTools .
PacBio 测序是一种非常有价值的第三代 DNA 测序方法,因为它具有非常长的读长、能够检测甲基化碱基以及实时测序方法。然而,迄今为止,还没有工具可用于分析 PacBio 数据的质量、抽样和过滤。
这里我们介绍了 SequelTools,这是一个命令行程序,包含三个工具:质量控制、读取抽样和读取过滤。质量控制工具可快速处理来自多个 SMRTcell 的 PacBio Sequel 原始测序数据,生成多个描述数据质量的统计信息和出版质量的图,包括 N50、读长和计数统计、PSR 和 ZOR。读取抽样工具允许用户根据以下一个或多个标准对读取进行抽样:最长的每个 CLR 的子读取或随机 CLR 选择。读取过滤工具提供了通过过滤掉某些低质量的碎片读取和/或通过最小 CLR 长度来归一化数据的选项。SequelTools 是用 bash、R 和 Python 实现的,仅使用标准库和包,并且与平台无关。
SequelTools 是一个程序,它提供了唯一免费、快速且易于使用的质量控制工具,也是唯一提供这种 PacBio Sequel 原始测序数据读取抽样和读取过滤的程序,可在 https://github.com/ISUgenomics/SequelTools 上获得。