Shen Wei, Sipos Botond, Zhao Liuyang
Department of Infectious Diseases, Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis The Second Affiliated Hospital of Chongqing Medical University Chongqing China.
European Molecular Biology Laboratory European Bioinformatics Institute Hinxton Cambridgeshire UK.
Imeta. 2024 Apr 5;3(3):e191. doi: 10.1002/imt2.191. eCollection 2024 Jun.
In the era of ubiquitous high-throughput sequencing studies, there is a growing need for analysis tools that are not just performant but also comprehensive and user-friendly enough to cater to both novice and advanced users. This article introduces SeqKit2, the next iteration of the widely used sequence analysis tool SeqKit, featuring expanded functionality, performance optimizations, and support for additional compression methods. Retaining a pragmatic subcommand architecture, SeqKit2 represents substantial enhancement through the inclusion of 19 additional subcommands, expanding its overall repertoire to a total of 38 in eight categories. The new subcommands add functionality such as amplicon processing and robust, error-tolerant parsing of sequence records. In addition, three subcommands designed for real-time analysis are added for periodic monitoring of properties of FASTQ and Binary Alignment/Map alignment records and real-time streaming from multiple sequence files. The performance of SeqKit2 is benchmarked against the old version of SeqKit, Bioawk, Seqtk, and SeqFu tools. SeqKit2 consistently outperforms its predecessor, albeit with marginally higher memory usage, while maintaining competitive runtimes against other tools. With its broad functionality, proven usability, and ongoing development driven by user feedback, we hope that bioinformaticians will find SeqKit2 useful as a "Swiss army knife" of sequence and alignment processing-equally adept at facilitating ad hoc analyses and seamlessly integrating into larger pipelines.
在高通量测序研究无处不在的时代,人们越来越需要这样的分析工具:它们不仅性能卓越,而且足够全面且用户友好,能够满足新手和高级用户的需求。本文介绍了SeqKit2,它是广泛使用的序列分析工具SeqKit的下一代版本,具有扩展的功能、性能优化以及对其他压缩方法的支持。SeqKit2保留了实用的子命令架构,通过增加19个额外的子命令实现了大幅增强,使其在八个类别中的命令总数扩展到38个。这些新子命令增加了诸如扩增子处理以及对序列记录进行强大、容错解析等功能。此外,还添加了三个用于实时分析的子命令,用于定期监测FASTQ和二进制比对/映射比对记录的属性以及从多个序列文件进行实时流处理。将SeqKit2的性能与SeqKit的旧版本、Bioawk、Seqtk和SeqFu工具进行了基准测试。SeqKit2始终优于其前身,尽管内存使用略高,同时与其他工具相比保持有竞争力的运行时间。凭借其广泛的功能、经过验证的易用性以及由用户反馈驱动的持续开发,我们希望生物信息学家会发现SeqKit2作为序列和比对处理的“瑞士军刀”很有用——同样擅长于促进临时分析并无缝集成到更大的流程中。