Canner Samuel W, Lu Lei, Takeshita Sho S, Gray Jeffrey J
Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD, United States.
Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California 94143, United States.
bioRxiv. 2025 Sep 6:2025.09.02.673778. doi: 10.1101/2025.09.02.673778.
Advances in deep learning have produced a range of models for predicting the protein-sugar interactome; however, structural docking of noncovalent protein-carbohydrate complexes remains largely unexplored. Although all-atom structure prediction models like AlphaFold3 (AF3), Boltz-1, Chai-1, DiffDock, and RosettaFold-All Atom (RFAA) were validated on protein-small molecule complexes, no benchmark or evaluation exists specifically for noncovalent protein-carbohydrate docking. To address this, we developed a high-quality dataset of experimental structures - Benchmark of CArbohydrate Protein Interactions (BCAPIN). Using BCAPIN and a novel evaluation metric, DockQC, we assessed the performance of all-atom structure prediction models on non-covalent protein-carbohydrate docking. We found all methods achieved comparable results, with an 85% success rate for structures of at least acceptable quality. However, we found that the predictive power of all models declined with increasing carbohydrate polymer length. With the capabilities and limitations assessed, we evaluated AF3's ability to predict binding for a set of putative human carbohydrate binding and carbohydrate non-binding proteins. While current models show promise, further development is needed to enable high-confidence, high-throughput prediction of the complete protein-sugar interactome.
深度学习的进展产生了一系列用于预测蛋白质-糖相互作用组的模型;然而,非共价蛋白质-碳水化合物复合物的结构对接在很大程度上仍未得到探索。尽管像AlphaFold3(AF3)、Boltz-1、Chai-1、DiffDock和RosettaFold-全原子(RFAA)这样的全原子结构预测模型在蛋白质-小分子复合物上得到了验证,但尚无专门针对非共价蛋白质-碳水化合物对接的基准测试或评估。为了解决这个问题,我们开发了一个高质量的实验结构数据集——碳水化合物-蛋白质相互作用基准(BCAPIN)。使用BCAPIN和一种新的评估指标DockQC,我们评估了全原子结构预测模型在非共价蛋白质-碳水化合物对接上的性能。我们发现所有方法都取得了相当的结果,至少质量可接受的结构成功率达到85%。然而,我们发现所有模型的预测能力都随着碳水化合物聚合物长度的增加而下降。在评估了这些模型的能力和局限性后,我们评估了AF3预测一组假定的人类碳水化合物结合蛋白和非碳水化合物结合蛋白结合的能力。虽然当前模型显示出了潜力,但仍需要进一步发展以实现对完整蛋白质-糖相互作用组的高可信度、高通量预测。