Suppr超能文献

利用机器学习方法和组合特征预测蛋白质结合位点中的保守水分子。

Predicting Conserved Water Molecules in Binding Sites of Proteins Using Machine Learning Methods and Combining Features.

机构信息

School of Electronic and Information, Shanghai Dianji University, Shanghai 201306, China.

出版信息

Comput Math Methods Med. 2022 Oct 3;2022:5104464. doi: 10.1155/2022/5104464. eCollection 2022.

Abstract

Water molecules play an important role in many biological processes in terms of stabilizing protein structures, assisting protein folding, and improving binding affinity. It is well known that, due to the impacts of various environmental factors, it is difficult to identify the conserved water molecules (CWMs) from free water molecules (FWMs) directly as CWMs are normally deeply embedded in proteins and form strong hydrogen bonds with surrounding polar groups. To circumvent this difficulty, in this work, the abundance of spatial structure information and physicochemical properties of water molecules in proteins inspires us to adopt machine learning methods for identifying the CWMs. Therefore, in this study, a machine learning framework to identify the CWMs in the binding sites of the proteins was presented. First, by analyzing water molecules' physicochemical properties and spatial structure information, six features (i.e., atom density, hydrophilicity, hydrophobicity, solvent-accessible surface area, temperature B-factors, and mobility) were extracted. Those features were further analyzed and combined to reach a higher CWM identification rate. As a result, an optimal feature combination was determined. Based on this optimal combination, seven different machine learning models (including support vector machine (SVM), -nearest neighbor (KNN), decision tree (DT), logistic regression (LR), discriminant analysis (DA), naïve Bayes (NB), and ensemble learning (EL)) were evaluated for their abilities in identifying two categories of water molecules, i.e., CWMs and FWMs. It showed that the EL model was the desired prediction model due to its comprehensive advantages. Furthermore, the presented methodology was validated through a case study of crystal 3skh and extensively compared with Dowser++. The prediction performance showed that the optimal feature combination and the desired EL model in our method could achieve satisfactory prediction accuracy in identifying CWMs from FWMs in the proteins' binding sites.

摘要

水分子在稳定蛋白质结构、协助蛋白质折叠和提高结合亲和力等方面在许多生物过程中起着重要作用。众所周知,由于各种环境因素的影响,很难直接从游离水分子 (FWMs) 中识别保守水分子 (CWMs),因为 CWMs 通常深埋在蛋白质中,并与周围的极性基团形成强氢键。为了克服这一困难,在这项工作中,我们受到蛋白质中水分子丰富的空间结构信息和物理化学性质的启发,采用机器学习方法来识别 CWMs。因此,本研究提出了一种用于识别蛋白质结合位点中 CWMs 的机器学习框架。首先,通过分析水分子的物理化学性质和空间结构信息,提取了六个特征(即原子密度、亲水性、疏水性、溶剂可及表面积、温度 B 因子和流动性)。进一步分析和组合这些特征,以达到更高的 CWMs 识别率。结果确定了最佳特征组合。基于此最佳组合,评估了七种不同的机器学习模型(包括支持向量机 (SVM)、最近邻 (KNN)、决策树 (DT)、逻辑回归 (LR)、判别分析 (DA)、朴素贝叶斯 (NB) 和集成学习 (EL)) 识别两类水分子的能力,即 CWMs 和 FWMs。结果表明,EL 模型是理想的预测模型,因为它具有综合优势。此外,通过晶体 3skh 的案例研究对所提出的方法进行了验证,并与 Dowser++进行了广泛比较。预测性能表明,我们方法中的最佳特征组合和理想的 EL 模型可以在蛋白质结合位点中从 FWMs 中实现 CWMs 的令人满意的预测准确性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验