Tietz Jonathan I, Schwalen Christopher J, Patel Parth S, Maxson Tucker, Blair Patricia M, Tai Hua-Chia, Zakai Uzma I, Mitchell Douglas A
Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
Nat Chem Biol. 2017 May;13(5):470-478. doi: 10.1038/nchembio.2319. Epub 2017 Feb 28.
Ribosomally synthesized and post-translationally modified peptide (RiPP) natural products are attractive for genome-driven discovery and re-engineering, but limitations in bioinformatic methods and exponentially increasing genomic data make large-scale mining of RiPP data difficult. We report RODEO (Rapid ORF Description and Evaluation Online), which combines hidden-Markov-model-based analysis, heuristic scoring, and machine learning to identify biosynthetic gene clusters and predict RiPP precursor peptides. We initially focused on lasso peptides, which display intriguing physicochemical properties and bioactivities, but their hypervariability renders them challenging prospects for automated mining. Our approach yielded the most comprehensive mapping to date of lasso peptide space, revealing >1,300 compounds. We characterized the structures and bioactivities of six lasso peptides, prioritized based on predicted structural novelty, including one with an unprecedented handcuff-like topology and another with a citrulline modification exceptionally rare among bacteria. These combined insights significantly expand the knowledge of lasso peptides and, more broadly, provide a framework for future genome-mining efforts.
核糖体合成及翻译后修饰肽(RiPP)天然产物在基于基因组的发现和重新设计方面具有吸引力,但生物信息学方法的局限性以及基因组数据的指数级增长使得大规模挖掘RiPP数据变得困难。我们报告了RODEO(在线快速开放阅读框描述与评估),它结合了基于隐马尔可夫模型的分析、启发式评分和机器学习来识别生物合成基因簇并预测RiPP前体肽。我们最初专注于套索肽,其具有引人入胜的物理化学性质和生物活性,但其高度变异性使其成为自动化挖掘的具有挑战性的目标。我们的方法产生了迄今为止最全面的套索肽空间图谱,揭示了超过1300种化合物。我们对六种套索肽的结构和生物活性进行了表征,这些套索肽是根据预测的结构新颖性进行优先排序的,其中一种具有前所未有的手铐状拓扑结构,另一种具有在细菌中极为罕见的瓜氨酸修饰。这些综合见解显著扩展了对套索肽的认识,更广泛地说,为未来的基因组挖掘工作提供了一个框架。