Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands.
PLoS One. 2013 Apr 18;8(4):e62136. doi: 10.1371/journal.pone.0062136. Print 2013.
There is a growing interest in the Non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) of microbes, fungi and plants because they can produce bioactive peptides such as antibiotics. The ability to identify the substrate specificity of the enzyme's adenylation (A) and acyl-transferase (AT) domains is essential to rationally deduce or engineer new products. We here report on a Hidden Markov Model (HMM)-based ensemble method to predict the substrate specificity at high quality. We collected a new reference set of experimentally validated sequences. An initial classification based on alignment and Neighbor Joining was performed in line with most of the previously published prediction methods. We then created and tested single substrate specific HMMs and found that their use improved the correct identification significantly for A as well as for AT domains. A major advantage of the use of HMMs is that it abolishes the dependency on multiple sequence alignment and residue selection that is hampering the alignment-based clustering methods. Using our models we obtained a high prediction quality for the substrate specificity of the A domains similar to two recently published tools that make use of HMMs or Support Vector Machines (NRPSsp and NRPS predictor2, respectively). Moreover, replacement of the single substrate specific HMMs by ensembles of models caused a clear increase in prediction quality. We argue that the superiority of the ensemble over the single model is caused by the way substrate specificity evolves for the studied systems. It is likely that this also holds true for other protein domains. The ensemble predictor has been implemented in a simple web-based tool that is available at http://www.cmbi.ru.nl/NRPS-PKS-substrate-predictor/.
人们对微生物、真菌和植物中的非核糖体肽合成酶(NRPSs)和聚酮合酶(PKSs)越来越感兴趣,因为它们可以产生抗生素等生物活性肽。鉴定酶的腺苷酰化(A)和酰基转移酶(AT)结构域的底物特异性的能力对于合理推断或设计新产品至关重要。我们在此报告了一种基于隐马尔可夫模型(HMM)的集成方法,可以高质量地预测底物特异性。我们收集了一组新的经过实验验证的序列作为参考集。根据大多数先前发表的预测方法,我们首先进行了基于比对和邻接法的初始分类。然后,我们创建并测试了单底物特异性 HMM,并发现它们的使用显著提高了 A 结构域和 AT 结构域的正确识别率。HMM 的一个主要优势是它消除了对多序列比对和残基选择的依赖,而这正是阻碍基于比对聚类方法的因素。使用我们的模型,我们获得了与最近发表的两种使用 HMM 或支持向量机(分别为 NRPSsp 和 NRPS predictor2)的工具相似的 A 结构域底物特异性的高预测质量。此外,用模型的集合替换单底物特异性 HMM 会明显提高预测质量。我们认为,集合优于单个模型的原因是研究系统中底物特异性的演变方式。对于其他蛋白质结构域,这很可能也是如此。该集成预测器已在一个简单的基于网络的工具中实现,可在 http://www.cmbi.ru.nl/NRPS-PKS-substrate-predictor/ 上获得。