Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
Cornell University, Ithaca, NY, 14850, USA.
Nat Commun. 2023 Jul 14;14(1):4219. doi: 10.1038/s41467-023-39905-4.
Recent analyses of public microbial genomes have found over a million biosynthetic gene clusters, the natural products of the majority of which remain unknown. Additionally, GNPS harbors billions of mass spectra of natural products without known structures and biosynthetic genes. We bridge the gap between large-scale genome mining and mass spectral datasets for natural product discovery by developing HypoRiPPAtlas, an Atlas of hypothetical natural product structures, which is ready-to-use for in silico database search of tandem mass spectra. HypoRiPPAtlas is constructed by mining genomes using seq2ripp, a machine-learning tool for the prediction of ribosomally synthesized and post-translationally modified peptides (RiPPs). In HypoRiPPAtlas, we identify RiPPs in microbes and plants. HypoRiPPAtlas could be extended to other natural product classes in the future by implementing corresponding biosynthetic logic. This study paves the way for large-scale explorations of biosynthetic pathways and chemical structures of microbial and plant RiPP classes.
最近对公共微生物基因组的分析发现了超过 100 万个生物合成基因簇,其中大多数天然产物仍然未知。此外,GNPS 还拥有数十亿个没有已知结构和生物合成基因的天然产物的质谱数据。我们通过开发 HypoRiPPAtlas 来弥合大规模基因组挖掘和天然产物发现的质谱数据集之间的差距,HypoRiPPAtlas 是一个假设的天然产物结构图谱,可用于串联质谱的计算机数据库搜索。HypoRiPPAtlas 是通过使用 seq2ripp 挖掘基因组构建的,seq2ripp 是一种用于预测核糖体合成和翻译后修饰肽 (RiPP) 的机器学习工具。在 HypoRiPPAtlas 中,我们在微生物和植物中鉴定出 RiPP。HypoRiPPAtlas 可以通过实现相应的生物合成逻辑,在未来扩展到其他天然产物类别。这项研究为大规模探索微生物和植物 RiPP 类别的生物合成途径和化学结构铺平了道路。