Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea.
Galux Inc., Gwanak-gu, Seoul 08738, Republic of Korea.
J Chem Inf Model. 2022 Jul 11;62(13):3157-3168. doi: 10.1021/acs.jcim.2c00306. Epub 2022 Jun 24.
Proteins interact with numerous water molecules to perform their physiological functions in biological organisms. Most water molecules act as solvent media; hence, their roles may be considered implicitly in theoretical treatments of protein structure and function. However, some water molecules interact intimately with proteins and require explicit treatment to understand their effects. Most physics-based computational methods are limited in their ability to accurately locate water molecules on protein surfaces because of inaccurate energy functions. Instead of relying on an energy function, this study attempts to learn the locations of water molecules from structural data. GalaxyWater-convolutional neural network (CNN) predicts water positions on protein chains, protein-protein interfaces, and protein-compound binding sites using a 3D-CNN model that is trained to generate a water score map on a given protein structure. The training data are compiled from high-resolution protein crystal structures resolved together with water molecules. GalaxyWater-CNN shows improved water prediction performance both in the coverage of crystal water molecules and in the accuracy of the predicted water positions when compared with previous energy-based methods. This method shows a superior performance in predicting water molecules that form hydrogen-bond networks precisely. The web service and the source code of this water prediction method are freely available at https://galaxy.seoklab.org/gwcnn and https://github.com/seoklab/GalaxyWater-CNN, respectively.
在生物体内,蛋白质与众多水分子相互作用以执行其生理功能。大多数水分子作为溶剂介质;因此,在蛋白质结构和功能的理论处理中,其作用可能被视为隐含的。然而,一些水分子与蛋白质密切相互作用,需要明确的处理来了解它们的影响。由于能量函数不准确,大多数基于物理的计算方法在准确确定蛋白质表面上水分子的位置方面能力有限。本研究尝试从结构数据中学习水分子的位置,而不是依赖能量函数。GalaxyWater-卷积神经网络 (CNN) 使用经过训练的 3D-CNN 模型预测蛋白质链、蛋白质-蛋白质界面和蛋白质-化合物结合位点上的水分子位置,该模型旨在为给定的蛋白质结构生成水分子得分图。训练数据是从与水分子一起解析的高分辨率蛋白质晶体结构中编译而来的。与以前基于能量的方法相比,GalaxyWater-CNN 在晶体水分子的覆盖范围和预测水分子位置的准确性方面都提高了水预测性能。该方法在准确预测形成氢键网络的水分子方面表现出优越的性能。该水预测方法的网络服务和源代码分别可在 https://galaxy.seoklab.org/gwcnn 和 https://github.com/seoklab/GalaxyWater-CNN 上免费获得。