Suppr超能文献

SeqForge:一个用于跨元基因组/基因组数据集进行基于比对的搜索、基序检测和序列整理的可扩展平台。

SeqForge: A scalable platform for alignment-based searches, motif detection, and sequence curation across meta/genomic datasets.

作者信息

Horvath Elijah R Bring, Winter Jaclyn M

机构信息

Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, Utah, 84112, United States.

出版信息

bioRxiv. 2025 Aug 15:2025.08.12.669971. doi: 10.1101/2025.08.12.669971.

Abstract

BACKGROUND

The rapid increase in publicly available microbial and metagenomic data has created a growing demand for tools that can efficiently perform custom large-scale comparative searches and functional annotation. While BLAST+ remains the standard for sequence similarity searches, population-level studies often require custom scripting and manual curation of results, which can present barriers for many researchers.

RESULTS

We developed SeqForge, a scalable, modular command-line toolkit that streamlines alignment-based searches and motif mining across large genomic datasets. SeqForge automates BLAST+ database creation and querying, integrates amino acid motif discovery, enables sequence and contig extraction, and curates results into structured, easily parsed formats. The platform supports diverse input formats, parallelized execution for high-performance computing environments, and built-in visualization tools. Benchmarking demonstrates that SeqForge achieves near-linear runtime scaling for computationally intensive modules while maintaining modest memory usage.

CONCLUSIONS

SeqForge lowers the computational barrier for large-scale meta/genomic exploration, enabling researchers to perform population-scale BLAST searches, motif detection, and sequence curation without custom scripting. The toolkit is freely available and platform-independent, making it suitable for both personal workstations and high-performance computing environments.

摘要

背景

公开可用的微生物和宏基因组数据迅速增加,对能够高效执行定制大规模比较搜索和功能注释的工具的需求也日益增长。虽然BLAST+仍然是序列相似性搜索的标准,但群体水平的研究通常需要定制脚本和手动整理结果,这可能给许多研究人员带来障碍。

结果

我们开发了SeqForge,这是一个可扩展的模块化命令行工具包,可简化跨大型基因组数据集的基于比对的搜索和基序挖掘。SeqForge可自动创建和查询BLAST+数据库,整合氨基酸基序发现功能,实现序列和重叠群提取,并将结果整理成结构化、易于解析的格式。该平台支持多种输入格式,可在高性能计算环境中并行执行,并具有内置可视化工具。基准测试表明,SeqForge在计算密集型模块上实现了接近线性的运行时扩展,同时保持适度的内存使用。

结论

SeqForge降低了大规模元基因组/基因组探索的计算障碍,使研究人员无需定制脚本即可进行群体规模的BLAST搜索、基序检测和序列整理。该工具包免费提供且与平台无关,适用于个人工作站和高性能计算环境。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cac9/12364017/e2a36cf0df1e/nihpp-2025.08.12.669971v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验