Archimedes, Athena Research Center, Marousi 15125, Greece.
Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens 16122, Greece.
J Chem Inf Model. 2024 Apr 8;64(7):2594-2611. doi: 10.1021/acs.jcim.3c01559. Epub 2024 Mar 29.
Water molecules are integral to the structural stability of proteins and vital for facilitating molecular interactions. However, accurately predicting their precise position around protein structures remains a significant challenge, making it a vibrant research area. In this paper, we introduce HydraProt (deep Hydration of Proteins), a novel methodology for predicting precise positions of water molecule oxygen atoms around protein structures, leveraging two interconnected deep learning architectures: a 3D U-net and a Multi-Layer Perceptron (MLP). Our approach starts by introducing a coarse voxel-based representation of the protein, which allows for rapid sampling of candidate water positions via the 3D U-net. These water positions are then assessed by embedding the water-protein relationship in the Euclidean space by means of an MLP. Finally, a postprocessing step is applied to further refine the MLP predictions. HydraProt surpasses existing state-of-the-art approaches in terms of precision and recall and has been validated on large data sets of protein structures. Notably, our method offers rapid inference runtime and should constitute the method of choice for protein structure studies and drug discovery applications. Our pretrained models, data, and the source code required to reproduce these results are accessible at https://github.com/azamanos/HydraProt.
水分子是蛋白质结构稳定性的组成部分,对于促进分子相互作用至关重要。然而,准确预测它们在蛋白质结构周围的确切位置仍然是一个重大挑战,这使得它成为一个活跃的研究领域。在本文中,我们介绍了 HydraProt(蛋白质深度水合作用),这是一种预测蛋白质结构周围水分子氧原子精确位置的新方法,利用了两个相互连接的深度学习架构:3D U-net 和多层感知机(MLP)。我们的方法首先引入了蛋白质的基于体素的粗表示,这允许通过 3D U-net 快速采样候选水分子位置。然后,通过 MLP 将水分子-蛋白质关系嵌入欧几里得空间来评估这些水分子位置。最后,应用后处理步骤进一步细化 MLP 预测。在精度和召回率方面,HydraProt 超过了现有的最先进方法,并在大型蛋白质结构数据集上进行了验证。值得注意的是,我们的方法提供了快速的推理运行时间,应该成为蛋白质结构研究和药物发现应用的首选方法。我们的预训练模型、数据以及重现这些结果所需的源代码可在 https://github.com/azamanos/HydraProt 上获得。