Suppr超能文献

SeqKit2:一款用于序列和比对处理的瑞士军刀式工具。

SeqKit2: A Swiss army knife for sequence and alignment processing.

作者信息

Shen Wei, Sipos Botond, Zhao Liuyang

机构信息

Department of Infectious Diseases, Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Institute for Viral Hepatitis The Second Affiliated Hospital of Chongqing Medical University Chongqing China.

European Molecular Biology Laboratory European Bioinformatics Institute Hinxton Cambridgeshire UK.

出版信息

Imeta. 2024 Apr 5;3(3):e191. doi: 10.1002/imt2.191. eCollection 2024 Jun.

Abstract

In the era of ubiquitous high-throughput sequencing studies, there is a growing need for analysis tools that are not just performant but also comprehensive and user-friendly enough to cater to both novice and advanced users. This article introduces SeqKit2, the next iteration of the widely used sequence analysis tool SeqKit, featuring expanded functionality, performance optimizations, and support for additional compression methods. Retaining a pragmatic subcommand architecture, SeqKit2 represents substantial enhancement through the inclusion of 19 additional subcommands, expanding its overall repertoire to a total of 38 in eight categories. The new subcommands add functionality such as amplicon processing and robust, error-tolerant parsing of sequence records. In addition, three subcommands designed for real-time analysis are added for periodic monitoring of properties of FASTQ and Binary Alignment/Map alignment records and real-time streaming from multiple sequence files. The performance of SeqKit2 is benchmarked against the old version of SeqKit, Bioawk, Seqtk, and SeqFu tools. SeqKit2 consistently outperforms its predecessor, albeit with marginally higher memory usage, while maintaining competitive runtimes against other tools. With its broad functionality, proven usability, and ongoing development driven by user feedback, we hope that bioinformaticians will find SeqKit2 useful as a "Swiss army knife" of sequence and alignment processing-equally adept at facilitating ad hoc analyses and seamlessly integrating into larger pipelines.

摘要

在高通量测序研究无处不在的时代,人们越来越需要这样的分析工具:它们不仅性能卓越,而且足够全面且用户友好,能够满足新手和高级用户的需求。本文介绍了SeqKit2,它是广泛使用的序列分析工具SeqKit的下一代版本,具有扩展的功能、性能优化以及对其他压缩方法的支持。SeqKit2保留了实用的子命令架构,通过增加19个额外的子命令实现了大幅增强,使其在八个类别中的命令总数扩展到38个。这些新子命令增加了诸如扩增子处理以及对序列记录进行强大、容错解析等功能。此外,还添加了三个用于实时分析的子命令,用于定期监测FASTQ和二进制比对/映射比对记录的属性以及从多个序列文件进行实时流处理。将SeqKit2的性能与SeqKit的旧版本、Bioawk、Seqtk和SeqFu工具进行了基准测试。SeqKit2始终优于其前身,尽管内存使用略高,同时与其他工具相比保持有竞争力的运行时间。凭借其广泛的功能、经过验证的易用性以及由用户反馈驱动的持续开发,我们希望生物信息学家会发现SeqKit2作为序列和比对处理的“瑞士军刀”很有用——同样擅长于促进临时分析并无缝集成到更大的流程中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad7f/11183193/88eeee49c0eb/IMT2-3-e191-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验