• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SeqForge:一个用于跨元基因组/基因组数据集进行基于比对的搜索、基序检测和序列整理的可扩展平台。

SeqForge: A scalable platform for alignment-based searches, motif detection, and sequence curation across meta/genomic datasets.

作者信息

Horvath Elijah R Bring, Winter Jaclyn M

机构信息

Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, Utah, 84112, United States.

出版信息

bioRxiv. 2025 Aug 15:2025.08.12.669971. doi: 10.1101/2025.08.12.669971.

DOI:10.1101/2025.08.12.669971
PMID:40832300
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12364017/
Abstract

BACKGROUND

The rapid increase in publicly available microbial and metagenomic data has created a growing demand for tools that can efficiently perform custom large-scale comparative searches and functional annotation. While BLAST+ remains the standard for sequence similarity searches, population-level studies often require custom scripting and manual curation of results, which can present barriers for many researchers.

RESULTS

We developed SeqForge, a scalable, modular command-line toolkit that streamlines alignment-based searches and motif mining across large genomic datasets. SeqForge automates BLAST+ database creation and querying, integrates amino acid motif discovery, enables sequence and contig extraction, and curates results into structured, easily parsed formats. The platform supports diverse input formats, parallelized execution for high-performance computing environments, and built-in visualization tools. Benchmarking demonstrates that SeqForge achieves near-linear runtime scaling for computationally intensive modules while maintaining modest memory usage.

CONCLUSIONS

SeqForge lowers the computational barrier for large-scale meta/genomic exploration, enabling researchers to perform population-scale BLAST searches, motif detection, and sequence curation without custom scripting. The toolkit is freely available and platform-independent, making it suitable for both personal workstations and high-performance computing environments.

摘要

背景

公开可用的微生物和宏基因组数据迅速增加,对能够高效执行定制大规模比较搜索和功能注释的工具的需求也日益增长。虽然BLAST+仍然是序列相似性搜索的标准,但群体水平的研究通常需要定制脚本和手动整理结果,这可能给许多研究人员带来障碍。

结果

我们开发了SeqForge,这是一个可扩展的模块化命令行工具包,可简化跨大型基因组数据集的基于比对的搜索和基序挖掘。SeqForge可自动创建和查询BLAST+数据库,整合氨基酸基序发现功能,实现序列和重叠群提取,并将结果整理成结构化、易于解析的格式。该平台支持多种输入格式,可在高性能计算环境中并行执行,并具有内置可视化工具。基准测试表明,SeqForge在计算密集型模块上实现了接近线性的运行时扩展,同时保持适度的内存使用。

结论

SeqForge降低了大规模元基因组/基因组探索的计算障碍,使研究人员无需定制脚本即可进行群体规模的BLAST搜索、基序检测和序列整理。该工具包免费提供且与平台无关,适用于个人工作站和高性能计算环境。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cac9/12364017/5915832f55fe/nihpp-2025.08.12.669971v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cac9/12364017/e2a36cf0df1e/nihpp-2025.08.12.669971v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cac9/12364017/9bd296809c0a/nihpp-2025.08.12.669971v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cac9/12364017/5915832f55fe/nihpp-2025.08.12.669971v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cac9/12364017/e2a36cf0df1e/nihpp-2025.08.12.669971v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cac9/12364017/9bd296809c0a/nihpp-2025.08.12.669971v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cac9/12364017/5915832f55fe/nihpp-2025.08.12.669971v1-f0003.jpg

相似文献

1
SeqForge: A scalable platform for alignment-based searches, motif detection, and sequence curation across meta/genomic datasets.SeqForge:一个用于跨元基因组/基因组数据集进行基于比对的搜索、基序检测和序列整理的可扩展平台。
bioRxiv. 2025 Aug 15:2025.08.12.669971. doi: 10.1101/2025.08.12.669971.
2
GRAPEVNE - Graphical Analytical Pipeline Development Environment for Infectious Diseases.GRAPEVNE - 传染病图形分析管道开发环境
Wellcome Open Res. 2025 May 27;10:279. doi: 10.12688/wellcomeopenres.23824.1. eCollection 2025.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Psychological therapies for panic disorder with or without agoraphobia in adults: a network meta-analysis.成人伴或不伴有广场恐惧症的惊恐障碍的心理治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2016 Apr 13;4(4):CD011004. doi: 10.1002/14651858.CD011004.pub2.
5
Metagenomics-Toolkit: the flexible and efficient cloud-based metagenomics workflow featuring machine learning-enabled resource allocation.宏基因组学工具包:基于云的灵活高效宏基因组学工作流程,具有支持机器学习的资源分配功能。
NAR Genom Bioinform. 2025 Jul 17;7(3):lqaf093. doi: 10.1093/nargab/lqaf093. eCollection 2025 Sep.
6
CrusTome: a transcriptome database resource for large-scale analyses across Crustacea.甲壳动物转录组数据库资源:用于大规模跨甲壳动物分析。
G3 (Bethesda). 2023 Jul 5;13(7). doi: 10.1093/g3journal/jkad098.
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
A Cloud-Based Platform for Harmonized COVID-19 Data: Design and Implementation of the Rapid Acceleration of Diagnostics (RADx) Data Hub.一个用于统一新冠病毒疾病(COVID-19)数据的基于云的平台:诊断快速加速(RADx)数据中心的设计与实现
JMIR Public Health Surveill. 2025 Aug 20;11:e72677. doi: 10.2196/72677.
9
Decontamination of DNA sequences from a Streptomyces genome for optimal genome mining.对链霉菌基因组中的DNA序列进行净化以实现最佳基因组挖掘。
Braz J Microbiol. 2025 Mar;56(1):79-89. doi: 10.1007/s42770-024-01598-2. Epub 2025 Jan 15.
10
PDF Entity Annotation Tool (PEAT).PDF实体注释工具(PEAT)。
J Open Source Softw. 2025 Apr 8;10(108):5336. doi: 10.21105/joss.05336.

本文引用的文献

1
antiSMASH 8.0: extended gene cluster detection capabilities and analyses of chemistry, enzymology, and regulation.抗SMASH 8.0:扩展的基因簇检测能力以及对化学、酶学和调控的分析
Nucleic Acids Res. 2025 Jul 7;53(W1):W32-W38. doi: 10.1093/nar/gkaf334.
2
Copper-dependent halogenase catalyses unactivated C-H bond functionalization.铜依赖性卤化酶催化未活化的碳氢键官能团化反应。
Nature. 2025 Feb;638(8049):126-132. doi: 10.1038/s41586-024-08362-4. Epub 2025 Jan 29.
3
getphylo: rapid and automatic generation of multi-locus phylogenetic trees.
getphylo:快速自动生成多位点系统发育树。
BMC Bioinformatics. 2025 Jan 18;26(1):21. doi: 10.1186/s12859-025-06035-1.
4
NCBI Taxonomy: enhanced access via NCBI Datasets.NCBI分类法:通过NCBI数据集实现更便捷的访问。
Nucleic Acids Res. 2025 Jan 6;53(D1):D1711-D1715. doi: 10.1093/nar/gkae967.
5
SeqKit2: A Swiss army knife for sequence and alignment processing.SeqKit2:一款用于序列和比对处理的瑞士军刀式工具。
Imeta. 2024 Apr 5;3(3):e191. doi: 10.1002/imt2.191. eCollection 2024 Jun.
6
CAGECAT: The CompArative GEne Cluster Analysis Toolbox for rapid search and visualisation of homologous gene clusters.CAGECAT:比较基因簇分析工具箱,用于快速搜索和可视化同源基因簇。
BMC Bioinformatics. 2023 May 3;24(1):181. doi: 10.1186/s12859-023-05311-2.
7
eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale.eggNOG-mapper v2:宏基因组尺度的功能注释、直系同源物分配和结构域预测。
Mol Biol Evol. 2021 Dec 9;38(12):5825-5829. doi: 10.1093/molbev/msab293.
8
AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence.AMRFinderPlus 和参考基因目录有助于研究抗生素耐药性、应激反应和毒力之间的基因组联系。
Sci Rep. 2021 Jun 16;11(1):12728. doi: 10.1038/s41598-021-91456-0.
9
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.BRAKER2:借助蛋白质数据库,由GeneMark-EP+和AUGUSTUS支持的真核生物基因组自动注释工具。
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.
10
Community-led, integrated, reproducible multi-omics with anvi'o.社区主导的、集成的、可重复的多组学分析,使用 anvi'o 软件。
Nat Microbiol. 2021 Jan;6(1):3-6. doi: 10.1038/s41564-020-00834-3.