Chemistry Research Laboratory, Oxford University, Oxford, UK.
UCL School of Pharmacy, London, UK.
Nat Chem Biol. 2018 Dec;14(12):1109-1117. doi: 10.1038/s41589-018-0154-9. Epub 2018 Nov 12.
The elucidation and prediction of how changes in a protein result in altered activities and selectivities remain a major challenge in chemistry. Two hurdles have prevented accurate family-wide models: obtaining (i) diverse datasets and (ii) suitable parameter frameworks that encapsulate activities in large sets. Here, we show that a relatively small but broad activity dataset is sufficient to train algorithms for functional prediction over the entire glycosyltransferase superfamily 1 (GT1) of the plant Arabidopsis thaliana. Whereas sequence analysis alone failed for GT1 substrate utilization patterns, our chemical-bioinformatic model, GT-Predict, succeeded by coupling physicochemical features with isozyme-recognition patterns over the family. GT-Predict identified GT1 biocatalysts for novel substrates and enabled functional annotation of uncharacterized GT1s. Finally, analyses of GT-Predict decision pathways revealed structural modulators of substrate recognition, thus providing information on mechanisms. This multifaceted approach to enzyme prediction may guide the streamlined utilization (and design) of biocatalysts and the discovery of other family-wide protein functions.
阐明和预测蛋白质的变化如何导致活性和选择性的改变,仍然是化学领域的一个主要挑战。有两个障碍阻止了准确的全家族模型的建立:获得(i)多样化的数据集和(ii)合适的参数框架,以封装在大型数据集的活性。在这里,我们表明,相对较小但广泛的活性数据集足以训练算法,以对植物拟南芥的糖基转移酶超家族 1(GT1)进行功能预测。尽管仅通过序列分析无法预测 GT1 的底物利用模式,但我们的化学 - 生物信息模型 GT-Predict 通过将物理化学特征与同工酶识别模式相结合,成功地应用于整个家族。GT-Predict 鉴定了 GT1 生物催化剂的新型底物,并能够对未表征的 GT1 进行功能注释。最后,对 GT-Predict 决策途径的分析揭示了底物识别的结构调节剂,从而提供了有关机制的信息。这种酶预测的多方面方法可以指导生物催化剂的简化利用(和设计)以及其他全家族蛋白质功能的发现。