Suppr超能文献

SANDPUMA:非核糖体肽化学的综合预测揭示了放线菌中的生物合成多样性。

SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria.

机构信息

Department of Genetics.

Department of Bacteriology and J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA.

出版信息

Bioinformatics. 2017 Oct 15;33(20):3202-3210. doi: 10.1093/bioinformatics/btx400.

Abstract

SUMMARY

Nonribosomally synthesized peptides (NRPs) are natural products with widespread applications in medicine and biotechnology. Many algorithms have been developed to predict the substrate specificities of nonribosomal peptide synthetase adenylation (A) domains from DNA sequences, which enables prioritization and dereplication, and integration with other data types in discovery efforts. However, insufficient training data and a lack of clarity regarding prediction quality have impeded optimal use. Here, we introduce prediCAT, a new phylogenetics-inspired algorithm, which quantitatively estimates the degree of predictability of each A-domain. We then systematically benchmarked all algorithms on a newly gathered, independent test set of 434 A-domain sequences, showing that active-site-motif-based algorithms outperform whole-domain-based methods. Subsequently, we developed SANDPUMA, a powerful ensemble algorithm, based on newly trained versions of all high-performing algorithms, which significantly outperforms individual methods. Finally, we deployed SANDPUMA in a systematic investigation of 7635 Actinobacteria genomes, suggesting that NRP chemical diversity is much higher than previously estimated. SANDPUMA has been integrated into the widely used antiSMASH biosynthetic gene cluster analysis pipeline and is also available as an open-source, standalone tool.

AVAILABILITY AND IMPLEMENTATION

SANDPUMA is freely available at https://bitbucket.org/chevrm/sandpuma and as a docker image at https://hub.docker.com/r/chevrm/sandpuma/ under the GNU Public License 3 (GPL3).

CONTACT

chevrette@wisc.edu or marnix.medema@wur.nl.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

非核糖体合成肽(NRP)是具有广泛医学和生物技术应用的天然产物。许多算法已被开发出来,用于从 DNA 序列预测非核糖体肽合成酶腺苷酸化(A)结构域的底物特异性,这使得能够在发现工作中进行优先级排序和去重复,并与其他数据类型集成。然而,训练数据不足和预测质量不明确阻碍了最佳使用。在这里,我们引入了 prediCAT,一种新的基于系统发生的算法,它定量估计每个 A 结构域的可预测性程度。然后,我们在一个新收集的、独立的 434 个 A 结构域序列测试集上系统地对所有算法进行了基准测试,结果表明基于活性位点基序的算法优于基于整个结构域的方法。随后,我们基于所有高性能算法的新训练版本开发了 SANDPUMA,这是一种强大的集成算法,显著优于单个方法。最后,我们在对 7635 个放线菌基因组的系统研究中部署了 SANDPUMA,表明 NRP 化学多样性比以前估计的要高得多。SANDPUMA 已集成到广泛使用的 antiSMASH 生物合成基因簇分析管道中,也可作为一个开源的独立工具使用。

可用性和实现

SANDPUMA 可在 https://bitbucket.org/chevrm/sandpuma 上免费获得,并可在 https://hub.docker.com/r/chevrm/sandpuma/ 作为 docker 镜像获得,许可证为 GNU 公共许可证 3(GPL3)。

联系方式

chevrette@wisc.edumarnix.medema@wur.nl

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

9
Dereplication and de novo sequencing of nonribosomal peptides.非核糖体肽的去重复和从头测序
Nat Methods. 2009 Aug;6(8):596-9. doi: 10.1038/nmeth.1350. Epub 2009 Jul 13.

引用本文的文献

本文引用的文献

3
Evolution and Ecology of Actinobacteria and Their Bioenergy Applications.放线菌的进化与生态及其生物能源应用
Annu Rev Microbiol. 2016 Sep 8;70:235-54. doi: 10.1146/annurev-micro-102215-095748.
10
Minimum Information about a Biosynthetic Gene cluster.生物合成基因簇的最低信息要求
Nat Chem Biol. 2015 Sep;11(9):625-31. doi: 10.1038/nchembio.1890.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验