Department of Biotechnology, The University of Tokyo, Japan.
Department of Biotechnology, The University of Tokyo, Japan; Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Japan.
Comput Biol Chem. 2022 Oct;100:107744. doi: 10.1016/j.compbiolchem.2022.107744. Epub 2022 Jul 23.
In this study, we developed a system that predicts the binding sites of proteins for five mononucleotides (AMP, ADP, ATP, GDP, and GTP). The system comprises two machine learning (ML)-based predictors using a convolutional neural network and a gradient boosting machine, two template-based predictors based on sequence and structure alignment, and a predictor that performs ensemble learning of these four predictors. In this study, data augmentation of ligand binding sites with similar ligand structures was performed. For example, in the prediction of ADP-binding sites using ML methods, the binding sites of AMP and ATP, which have similar structures, are considered. In addition, we constructed the structure models using AlphaFold2, a highly accurate protein prediction method. The secondary structure and dihedral angle information obtained using the model structures were used as ML predictor features. Additionally, in the template-based predictor, the structures of the binding sites were used as templates to be explored based on structure alignment to identify the binding site of the target. Consequently, the template-based predictor based on structure alignment showed the best performance among the four individual predictors, and the ensemble predictor achieved the best performance, with an area under the curve of 0.958 for all mononucleotides.
在这项研究中,我们开发了一个系统,用于预测五种单核苷酸(AMP、ADP、ATP、GDP 和 GTP)的蛋白质结合位点。该系统包括两个基于机器学习(ML)的预测器,使用卷积神经网络和梯度提升机,两个基于序列和结构比对的模板预测器,以及一个对这四个预测器进行集成学习的预测器。在这项研究中,对具有相似配体结构的配体结合位点进行了数据扩充。例如,在使用 ML 方法预测 ADP 结合位点时,考虑了具有相似结构的 AMP 和 ATP 的结合位点。此外,我们使用 AlphaFold2 构建了高度精确的蛋白质预测方法的结构模型。使用模型结构获得的二级结构和二面角信息被用作 ML 预测器特征。此外,在基于模板的预测器中,使用结合位点的结构作为模板,通过结构比对进行探索,以识别目标的结合位点。因此,基于结构比对的模板预测器在四个独立预测器中表现最好,而集成预测器的表现最好,所有单核苷酸的曲线下面积为 0.958。