National Energy R&D Center for Biorefinery, Beijing University of Chemical Technology, 100029, Beijing, China.
J Chem Inf Model. 2023 Jul 24;63(14):4277-4290. doi: 10.1021/acs.jcim.3c00273. Epub 2023 Jul 3.
Determining the catalytic site of enzymes is a great help for understanding the relationship between protein sequence, structure, and function, which provides the basis and targets for designing, modifying, and enhancing enzyme activity. The unique local spatial configuration bound to the substrate at the active center of the enzyme determines the catalytic ability of enzymes and plays an important role in the catalytic site prediction. As a suitable tool, the graph neural network can better understand and identify the residue sites with unique local spatial configurations due to its remarkable ability to characterize the three-dimensional structural features of proteins. Consequently, a novel model for predicting enzyme catalytic sites has been developed, which incorporates a uniquely designed adaptive edge-gated graph attention neural network (AEGAN). This model is capable of effectively handling sequential and structural characteristics of proteins at various levels, and the extracted features enable an accurate description of the local spatial configuration of the enzyme active site by sampling the local space around candidate residues and special design of amino acid physical and chemical properties. To evaluate its performance, the model was compared with existing catalytic site prediction models using different benchmark datasets and achieved the best results on each benchmark dataset. The model exhibited a sensitivity of 0.9659, accuracy of 0.9226, and area under the precision-recall curve (AUPRC) of 0.9241 on the independent test set constructed for evaluation. Furthermore, the F1-score of this model is nearly four times higher than that of the best-performing similar model in previous studies. This research can serve as a valuable tool to help researchers understand protein sequence-structure-function relationships while facilitating the characterization of novel enzymes of unknown function.
确定酶的催化位点对于理解蛋白质序列、结构和功能之间的关系非常有帮助,它为设计、修饰和增强酶活性提供了基础和目标。酶活性中心与底物结合的独特局部空间构型决定了酶的催化能力,在催化位点预测中起着重要作用。图神经网络作为一种合适的工具,由于其出色的表征蛋白质三维结构特征的能力,可以更好地理解和识别具有独特局部空间构型的残基位点。因此,开发了一种新的预测酶催化位点的模型,该模型结合了独特设计的自适应边缘门控图注意神经网络(AEGAN)。该模型能够有效地处理蛋白质在不同层次上的序列和结构特征,提取的特征通过对候选残基周围的局部空间进行采样以及对氨基酸物理化学性质的特殊设计,能够准确描述酶活性位点的局部空间构型。为了评估其性能,将该模型与现有的催化位点预测模型在不同的基准数据集上进行了比较,并在每个基准数据集上都取得了最好的结果。该模型在为评估而构建的独立测试集上的灵敏度为 0.9659,准确性为 0.9226,精度-召回曲线下面积(AUPRC)为 0.9241。此外,该模型的 F1 分数比之前研究中表现最好的类似模型高近四倍。这项研究可以作为一种有价值的工具,帮助研究人员理解蛋白质序列-结构-功能关系,同时促进对未知功能的新型酶的特征描述。