Canner Samuel W, Shanker Sudhanshu, Gray Jeffrey J
Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States.
Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, United States.
Front Bioinform. 2023 Jun 20;3:1186531. doi: 10.3389/fbinf.2023.1186531. eCollection 2023.
Carbohydrates dynamically and transiently interact with proteins for cell-cell recognition, cellular differentiation, immune response, and many other cellular processes. Despite the molecular importance of these interactions, there are currently few reliable computational tools to predict potential carbohydrate-binding sites on any given protein. Here, we present two deep learning (DL) models named CArbohydrate-Protein interaction Site IdentiFier (CAPSIF) that predicts non-covalent carbohydrate-binding sites on proteins: (1) a 3D-UNet voxel-based neural network model (CAPSIF:V) and (2) an equivariant graph neural network model (CAPSIF:G). While both models outperform previous surrogate methods used for carbohydrate-binding site prediction, CAPSIF:V performs better than CAPSIF:G, achieving test Dice scores of 0.597 and 0.543 and test set Matthews correlation coefficients (MCCs) of 0.599 and 0.538, respectively. We further tested CAPSIF:V on AlphaFold2-predicted protein structures. CAPSIF:V performed equivalently on both experimentally determined structures and AlphaFold2-predicted structures. Finally, we demonstrate how CAPSIF models can be used in conjunction with local glycan-docking protocols, such as GlycanDock, to predict bound protein-carbohydrate structures.
碳水化合物与蛋白质动态且短暂地相互作用,以实现细胞间识别、细胞分化、免疫反应及许多其他细胞过程。尽管这些相互作用在分子层面具有重要意义,但目前几乎没有可靠的计算工具来预测任何给定蛋白质上潜在的碳水化合物结合位点。在此,我们提出了两种名为碳水化合物 - 蛋白质相互作用位点识别器(CAPSIF)的深度学习(DL)模型,用于预测蛋白质上的非共价碳水化合物结合位点:(1)一种基于3D - UNet体素的神经网络模型(CAPSIF:V)和(2)一种等变图神经网络模型(CAPSIF:G)。虽然这两种模型都优于先前用于碳水化合物结合位点预测的替代方法,但CAPSIF:V的表现优于CAPSIF:G,其测试Dice分数分别为0.597和0.543,测试集马修斯相关系数(MCC)分别为0.599和0.538。我们进一步在AlphaFold2预测的蛋白质结构上测试了CAPSIF:V。CAPSIF:V在实验确定的结构和AlphaFold2预测的结构上表现相当。最后,我们展示了如何将CAPSIF模型与局部聚糖对接协议(如GlycanDock)结合使用,以预测结合的蛋白质 - 碳水化合物结构。