SANDPUMA：非核糖体肽化学的综合预测揭示了放线菌中的生物合成多样性。

SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria.

机构信息

Department of Genetics.

Department of Bacteriology and J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA.

出版信息

Bioinformatics. 2017 Oct 15;33(20):3202-3210. doi: 10.1093/bioinformatics/btx400.

DOI:10.1093/bioinformatics/btx400

PMID:28633438

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5860034/

Abstract

SUMMARY

Nonribosomally synthesized peptides (NRPs) are natural products with widespread applications in medicine and biotechnology. Many algorithms have been developed to predict the substrate specificities of nonribosomal peptide synthetase adenylation (A) domains from DNA sequences, which enables prioritization and dereplication, and integration with other data types in discovery efforts. However, insufficient training data and a lack of clarity regarding prediction quality have impeded optimal use. Here, we introduce prediCAT, a new phylogenetics-inspired algorithm, which quantitatively estimates the degree of predictability of each A-domain. We then systematically benchmarked all algorithms on a newly gathered, independent test set of 434 A-domain sequences, showing that active-site-motif-based algorithms outperform whole-domain-based methods. Subsequently, we developed SANDPUMA, a powerful ensemble algorithm, based on newly trained versions of all high-performing algorithms, which significantly outperforms individual methods. Finally, we deployed SANDPUMA in a systematic investigation of 7635 Actinobacteria genomes, suggesting that NRP chemical diversity is much higher than previously estimated. SANDPUMA has been integrated into the widely used antiSMASH biosynthetic gene cluster analysis pipeline and is also available as an open-source, standalone tool.

AVAILABILITY AND IMPLEMENTATION

SANDPUMA is freely available at https://bitbucket.org/chevrm/sandpuma and as a docker image at https://hub.docker.com/r/chevrm/sandpuma/ under the GNU Public License 3 (GPL3).

CONTACT

chevrette@wisc.edu or marnix.medema@wur.nl.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

非核糖体合成肽（NRP）是具有广泛医学和生物技术应用的天然产物。许多算法已被开发出来，用于从 DNA 序列预测非核糖体肽合成酶腺苷酸化（A）结构域的底物特异性，这使得能够在发现工作中进行优先级排序和去重复，并与其他数据类型集成。然而，训练数据不足和预测质量不明确阻碍了最佳使用。在这里，我们引入了 prediCAT，一种新的基于系统发生的算法，它定量估计每个 A 结构域的可预测性程度。然后，我们在一个新收集的、独立的 434 个 A 结构域序列测试集上系统地对所有算法进行了基准测试，结果表明基于活性位点基序的算法优于基于整个结构域的方法。随后，我们基于所有高性能算法的新训练版本开发了 SANDPUMA，这是一种强大的集成算法，显著优于单个方法。最后，我们在对 7635 个放线菌基因组的系统研究中部署了 SANDPUMA，表明 NRP 化学多样性比以前估计的要高得多。SANDPUMA 已集成到广泛使用的 antiSMASH 生物合成基因簇分析管道中，也可作为一个开源的独立工具使用。

可用性和实现

SANDPUMA 可在 https://bitbucket.org/chevrm/sandpuma 上免费获得，并可在 https://hub.docker.com/r/chevrm/sandpuma/ 作为 docker 镜像获得，许可证为 GNU 公共许可证 3（GPL3）。

联系方式

chevrette@wisc.edu 或 marnix.medema@wur.nl。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria.SANDPUMA：非核糖体肽化学的综合预测揭示了放线菌中的生物合成多样性。

Bioinformatics. 2017 Oct 15;33(20):3202-3210. doi: 10.1093/bioinformatics/btx400.

antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification.antiSMASH 4.0——化学预测和基因簇边界识别的改进。

Nucleic Acids Res. 2017 Jul 3;45(W1):W36-W41. doi: 10.1093/nar/gkx319.

A proteomic survey of nonribosomal peptide and polyketide biosynthesis in actinobacteria.放线菌中非核糖体肽和聚酮化合物生物合成的蛋白质组学研究。

J Proteome Res. 2012 Jan 1;11(1):85-94. doi: 10.1021/pr2009115. Epub 2011 Oct 25.

Identification of Sare0718 as an alanine-activating adenylation domain in marine actinomycete Salinispora arenicola CNS-205.鉴定海洋放线菌盐矿沙雷氏菌 CNS-205 中的 Sare0718 为丙氨酸激活的腺苷酸转移酶结构域。

PLoS One. 2012;7(5):e37487. doi: 10.1371/journal.pone.0037487. Epub 2012 May 24.

Bioinformatics Tools for the Discovery of New Nonribosomal Peptides.用于发现新型非核糖体肽的生物信息学工具

Methods Mol Biol. 2016;1401:209-32. doi: 10.1007/978-1-4939-3375-4_14.

Alignment-Free Methods for the Detection and Specificity Prediction of Adenylation Domains.用于腺苷酸化结构域检测和特异性预测的无比对方法

Methods Mol Biol. 2016;1401:253-72. doi: 10.1007/978-1-4939-3375-4_16.

Computational discovery of specificity-conferring sites in non-ribosomal peptide synthetases.非核糖体肽合成酶中特异性决定部位的计算发现。

Bioinformatics. 2016 Feb 1;32(3):325-9. doi: 10.1093/bioinformatics/btv600. Epub 2015 Oct 14.

Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs).使用转导支持向量机（TSVM）预测非核糖体肽合成酶（NRPS）中腺苷化结构域的特异性

Nucleic Acids Res. 2005 Oct 12;33(18):5799-808. doi: 10.1093/nar/gki885. Print 2005.

Dereplication and de novo sequencing of nonribosomal peptides.非核糖体肽的去重复和从头测序

Nat Methods. 2009 Aug;6(8):596-9. doi: 10.1038/nmeth.1350. Epub 2009 Jul 13.

NRPS-PKS: a knowledge-based resource for analysis of NRPS/PKS megasynthases.NRPS-PKS：用于分析NRPS/PKS大型合成酶的基于知识的资源。

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W405-13. doi: 10.1093/nar/gkh359.

引用本文的文献

Comparative Analysis of Signature Sequences from Adenylation Domains Situated within Bacterial-Origin Nonribosomal Peptide Synthetase Modules.细菌源非核糖体肽合成酶模块中腺苷化结构域特征序列的比较分析

J Microbiol Biotechnol. 2025 Jul 14;35:e2502030. doi: 10.4014/jmb.2503.02030.

Targeted genome mining with GATOR-GC maps the evolutionary landscape of biosynthetic diversity.利用GATOR-GC进行靶向基因组挖掘可绘制生物合成多样性的进化图谱。

Nucleic Acids Res. 2025 Jul 8;53(13). doi: 10.1093/nar/gkaf606.

Sequence modeling tools to decode the biosynthetic diversity of the human microbiome.用于解码人类微生物组生物合成多样性的序列建模工具。

mSystems. 2025 Jul 22;10(7):e0033325. doi: 10.1128/msystems.00333-25. Epub 2025 Jun 30.

Genome Mining Leads to the Discovery of Kasichelins A-D, Unusual β-Alanine- and β-Aminoisobutyric Acid-Containing Siderophores from .基因组挖掘促成了卡西螯菌素A-D的发现，这是一类来自……的含有不寻常β-丙氨酸和β-氨基异丁酸的铁载体。

J Nat Prod. 2025 Jul 25;88(7):1719-1728. doi: 10.1021/acs.jnatprod.5c00461. Epub 2025 Jun 24.

Targeted genome mining with GATOR-GC maps the evolutionary landscape of biosynthetic diversity.利用GATOR-GC进行靶向基因组挖掘可绘制生物合成多样性的进化图谱。

bioRxiv. 2025 Feb 28:2025.02.24.639861. doi: 10.1101/2025.02.24.639861.

Fatty acyl-AMP ligases in bacterial natural product biosynthesis.细菌天然产物生物合成中的脂肪酰-AMP连接酶

Nat Prod Rep. 2025 Apr 16;42(4):739-753. doi: 10.1039/d4np00073k.

Interpretable adenylation domain specificity prediction using protein language models.使用蛋白质语言模型进行可解释的腺苷化结构域特异性预测。

bioRxiv. 2025 Jan 18:2025.01.13.632878. doi: 10.1101/2025.01.13.632878.

Synthetic-bioinformatic natural product-inspired peptides.合成生物信息学天然产物启发的肽

Nat Prod Rep. 2025 Jan 22;42(1):50-66. doi: 10.1039/d4np00043a.

RAIChU: automating the visualisation of natural product biosynthesis.RAIChU：实现天然产物生物合成可视化的自动化

J Cheminform. 2024 Sep 3;16(1):106. doi: 10.1186/s13321-024-00898-x.

was prevalent and caused taro soft rot when coexisting with the complex, with a preference for Araceae plants.普遍存在，与该复合体共存时会导致芋头软腐病，偏好天南星科植物。

Front Microbiol. 2024 Jun 25;15:1431047. doi: 10.3389/fmicb.2024.1431047. eCollection 2024.

本文引用的文献

antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification.antiSMASH 4.0——化学预测和基因簇边界识别的改进。

Nucleic Acids Res. 2017 Jul 3;45(W1):W36-W41. doi: 10.1093/nar/gkx319.

Indexing the Pseudomonas specialized metabolome enabled the discovery of poaeamide B and the bananamides.对假单胞菌特化代谢组进行索引，使我们能够发现波伊酰胺 B 和香蕉酰胺。

Nat Microbiol. 2016 Oct 31;2:16197. doi: 10.1038/nmicrobiol.2016.197.

Evolution and Ecology of Actinobacteria and Their Bioenergy Applications.放线菌的进化与生态及其生物能源应用

Annu Rev Microbiol. 2016 Sep 8;70:235-54. doi: 10.1146/annurev-micro-102215-095748.

Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking.通过全球天然产物社会分子网络共享和社区管理质谱数据。

Nat Biotechnol. 2016 Aug 9;34(8):828-837. doi: 10.1038/nbt.3597.

A hybrid polyketide-nonribosomal peptide in nematodes that promotes larval survival.线虫体内一种促进幼虫存活的混合聚酮化合物-非核糖体肽。

Nat Chem Biol. 2016 Oct;12(10):770-2. doi: 10.1038/nchembio.2144. Epub 2016 Aug 8.

Gene Discovery for Synthetic Biology: Exploring the Novel Natural Product Biosynthetic Capacity of Eukaryotic Microalgae.合成生物学的基因发现：探索真核微藻新型天然产物生物合成能力

Methods Enzymol. 2016;576:99-120. doi: 10.1016/bs.mie.2016.03.005. Epub 2016 Apr 5.

Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes.天然产物生物合成基因簇的系统基因组学分析助力在模式链霉菌中发现有机砷代谢产物。

Genome Biol Evol. 2016 Jul 2;8(6):1906-16. doi: 10.1093/gbe/evw125.

Computational discovery of specificity-conferring sites in non-ribosomal peptide synthetases.非核糖体肽合成酶中特异性决定部位的计算发现。

Bioinformatics. 2016 Feb 1;32(3):325-9. doi: 10.1093/bioinformatics/btv600. Epub 2015 Oct 14.

Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM).基因组到天然产物的次生代谢产物预测信息学（PRISM）。

Nucleic Acids Res. 2015 Nov 16;43(20):9645-62. doi: 10.1093/nar/gkv1012. Epub 2015 Oct 5.

Minimum Information about a Biosynthetic Gene cluster.生物合成基因簇的最低信息要求

Nat Chem Biol. 2015 Sep;11(9):625-31. doi: 10.1038/nchembio.1890.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。