Shenzhen Bay Laboratory, Shenzhen 518055, China.
Bioinformatics. 2020 Jun 1;36(11):3561-3562. doi: 10.1093/bioinformatics/btaa171.
Next-generation sequencing (NGS) data frequently suffer from poor-quality cycles and adapter contaminations therefore need to be preprocessed before downstream analyses. With the ever-growing throughput and read length of modern sequencers, the preprocessing step turns to be a bottleneck in data analysis due to unmet performance of current tools. Extra-fast and accurate adapter- and quality-trimming tools for sequencing data preprocessing are therefore still of urgent demand.
Ktrim was developed in this work. Key features of Ktrim include: built-in support to adapters of common library preparation kits; supports user-supplied, customized adapter sequences; supports both paired-end and single-end data; supports parallelization to accelerate the analysis. Ktrim was ∼2-18 times faster than current tools and also showed high accuracy when applied on the testing datasets. Ktrim could thus serve as a valuable and efficient tool for short-read NGS data preprocessing.
Source codes and scripts to reproduce the results descripted in this article are freely available at https://github.com/hellosunking/Ktrim/, distributed under the GPL v3 license.
Supplementary data are available at Bioinformatics online.
下一代测序(NGS)数据经常受到低质量循环和接头污染的影响,因此需要在下游分析之前进行预处理。随着现代测序仪的通量和读长的不断增长,由于当前工具的性能无法满足要求,预处理步骤成为数据分析的瓶颈。因此,针对测序数据预处理的超快速、准确的接头和质量修剪工具仍然是迫切需要的。
在这项工作中开发了 Ktrim。Ktrim 的主要特点包括:内置对常见文库制备试剂盒的接头的支持;支持用户提供的、自定义的接头序列;支持配对端和单端数据;支持并行化以加速分析。Ktrim 的速度比当前的工具快 2-18 倍,并且在测试数据集上也表现出了很高的准确性。因此,Ktrim 可以作为一种用于短读长 NGS 数据预处理的有价值且高效的工具。
本文描述的结果的源代码和脚本可在 https://github.com/hellosunking/Ktrim/ 上免费获得,根据 GPL v3 许可证分发。
补充数据可在 Bioinformatics 在线获得。