BioTechnology Institute, University of Minnesota, Saint Paul, Minnesota, USA; Graduate Program in Bioinformatics and Computational Biology, University of Minnesota, Rochester, Minnesota, USA; Graduate Program in Microbiology, Immunology, and Cancer Biology, University of Minnesota, Minneapolis, Minnesota, USA.
Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
J Biol Chem. 2020 Oct 30;295(44):14826-14839. doi: 10.1074/jbc.RA120.013528. Epub 2020 Aug 21.
Enzymes that cleave ATP to activate carboxylic acids play essential roles in primary and secondary metabolism in all domains of life. Class I adenylate-forming enzymes share a conserved structural fold but act on a wide range of substrates to catalyze reactions involved in bioluminescence, nonribosomal peptide biosynthesis, fatty acid activation, and β-lactone formation. Despite their metabolic importance, the substrates and functions of the vast majority of adenylate-forming enzymes are unknown without tools available to accurately predict them. Given the crucial roles of adenylate-forming enzymes in biosynthesis, this also severely limits our ability to predict natural product structures from biosynthetic gene clusters. Here we used machine learning to predict adenylate-forming enzyme function and substrate specificity from protein sequences. We built a web-based predictive tool and used it to comprehensively map the biochemical diversity of adenylate-forming enzymes across >50,000 candidate biosynthetic gene clusters in bacterial, fungal, and plant genomes. Ancestral phylogenetic reconstruction and sequence similarity networking of enzymes from these clusters suggested divergent evolution of the adenylate-forming superfamily from a core enzyme scaffold most related to contemporary CoA ligases toward more specialized functions including β-lactone synthetases. Our classifier predicted β-lactone synthetases in uncharacterized biosynthetic gene clusters conserved in >90 different strains of To test our prediction, we purified a candidate β-lactone synthetase from and reconstituted the biosynthetic pathway to link the gene cluster to the β-lactone natural product, nocardiolactone. We anticipate that our machine learning approach will aid in functional classification of enzymes and advance natural product discovery.
在所有生命领域的初级和次级代谢中,能够切割 ATP 以激活羧酸的酶都发挥着重要作用。I 类腺苷酸形成酶具有保守的结构折叠,但作用于广泛的底物,催化涉及生物发光、非核糖体肽生物合成、脂肪酸激活和β-内酰胺形成的反应。尽管它们在代谢中很重要,但如果没有可用于准确预测的工具,绝大多数腺苷酸形成酶的底物和功能都是未知的。鉴于腺苷酸形成酶在生物合成中的关键作用,这也严重限制了我们从生物合成基因簇预测天然产物结构的能力。在这里,我们使用机器学习从蛋白质序列预测腺苷酸形成酶的功能和底物特异性。我们构建了一个基于网络的预测工具,并使用它全面绘制了>50,000 个候选生物合成基因簇中细菌、真菌和植物基因组中腺苷酸形成酶的生化多样性。来自这些簇的酶的系统发育重建和序列相似性网络分析表明,腺苷酸形成超家族是从与当代 CoA 连接酶最相关的核心酶支架中分化而来的,其功能更加多样化,包括β-内酰胺合成酶。我们的分类器预测了在>90 种不同菌株中保守的未表征生物合成基因簇中的β-内酰胺合成酶,以验证我们的预测,我们从纯化候选β-内酰胺合成酶,并重新构建生物合成途径,将基因簇与β-内酰胺天然产物,诺卡内酯连接起来。我们预计我们的机器学习方法将有助于酶的功能分类,并推进天然产物的发现。