Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China.
Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China.
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae641.
High-throughput sequencing technologies [next-generation sequencing (NGS)] are increasingly used to address diverse biological questions. Despite the rich information in NGS data, particularly with the growing datasets from repositories like the Genome Sequence Archive (GSA) at NGDC, programmatic access to public sequencing data and metadata remains limited.
We developed iSeq to enable quick and straightforward retrieval of metadata and NGS data from multiple databases via the command-line interface. iSeq supports simultaneous retrieval from GSA, SRA, ENA, and DDBJ databases. It handles over 25 different accession formats, supports Aspera downloads, parallel downloads, multi-threaded processes, FASTQ file merging, and integrity verification, simplifying data acquisition and enhancing the capacity for reanalyzing NGS data.
iSeq is freely available on Bioconda (https://anaconda.org/bioconda/iseq) and GitHub (https://github.com/BioOmics/iSeq).
高通量测序技术(下一代测序(NGS))越来越多地被用于解决各种生物学问题。尽管 NGS 数据包含丰富的信息,特别是随着来自像 NGDC 的基因组序列档案(GSA)这样的存储库的数据集不断增长,但对公共测序数据和元数据的编程访问仍然有限。
我们开发了 iSeq,通过命令行界面实现了从多个数据库快速、直接地检索元数据和 NGS 数据。iSeq 支持同时从 GSA、SRA、ENA 和 DDBJ 数据库中检索。它处理超过 25 种不同的访问格式,支持 Aspera 下载、并行下载、多线程进程、FASTQ 文件合并和完整性验证,简化了数据采集并增强了重新分析 NGS 数据的能力。
iSeq 可在 Bioconda(https://anaconda.org/bioconda/iseq)和 GitHub(https://github.com/BioOmics/iSeq)上免费获得。