McFee Matthew, Kim Philip M
Department of Molecular Genetics, The University of Toronto, Toronto, ON M5S 1A8, Canada.
Donnelly Centre for Cellular and Biomolecular Research, The University of Toronto, Toronto, ON M5S 3E1, Canada.
Bioinform Adv. 2023 Jun 12;3(1):vbad072. doi: 10.1093/bioadv/vbad072. eCollection 2023.
Protein complexes play vital roles in a variety of biological processes, such as mediating biochemical reactions, the immune response and cell signalling, with 3D structure specifying function. Computational docking methods provide a means to determine the interface between two complexed polypeptide chains without using time-consuming experimental techniques. The docking process requires the optimal solution to be selected with a scoring function. Here, we propose a novel graph-based deep learning model that utilizes mathematical graph representations of proteins to learn a scoring function (GDockScore). GDockScore was pre-trained on docking outputs generated with the Protein Data Bank biounits and the RosettaDock protocol, and then fine-tuned on HADDOCK decoys generated on the ZDOCK Protein Docking Benchmark. GDockScore performs similarly to the Rosetta scoring function on docking decoys generated using the RosettaDock protocol. Furthermore, state-of-the-art is achieved on the CAPRI score set, a challenging dataset for developing docking scoring functions.
The model implementation is available at https://gitlab.com/mcfeemat/gdockscore.
Supplementary data are available at online.
蛋白质复合物在多种生物过程中发挥着至关重要的作用,例如介导生化反应、免疫反应和细胞信号传导,其三维结构决定功能。计算对接方法提供了一种无需使用耗时的实验技术来确定两个复合多肽链之间界面的手段。对接过程需要通过评分函数选择最优解。在此,我们提出了一种基于图的新型深度学习模型,该模型利用蛋白质的数学图表示来学习评分函数(GDockScore)。GDockScore在使用蛋白质数据库生物单元和RosettaDock协议生成的对接输出上进行预训练,然后在ZDOCK蛋白质对接基准上生成的HADDOCK诱饵上进行微调。在使用RosettaDock协议生成的对接诱饵上,GDockScore的表现与Rosetta评分函数相似。此外,在CAPRI评分集上达到了当前最优水平,CAPRI评分集是开发对接评分函数的具有挑战性的数据集。
模型实现可在https://gitlab.com/mcfeemat/gdockscore获取。
补充数据可在网上获取。