Yu Yang, Wang Zhe, Wang Lingling, Tian Sheng, Hou Tingjun, Sun Huiyong
Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China.
Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
J Cheminform. 2022 Aug 20;14(1):56. doi: 10.1186/s13321-022-00639-y.
Protein mutations occur frequently in biological systems, which may impact, for example, the binding of drugs to their targets through impairing the critical H-bonds, changing the hydrophobic interactions, etc. Thus, accurately predicting the effects of mutations on biological systems is of great interests to various fields. Unfortunately, it is still unavailable to conduct large-scale wet-lab mutation experiments because of the unaffordable experimental time and financial costs. Alternatively, in silico computation can serve as a pioneer to guide the experiments. In fact, numerous pioneering works have been conducted from computationally cheaper machine-learning (ML) methods to the more expensive alchemical methods with the purpose to accurately predict the mutation effects. However, these methods usually either cannot result in a physically understandable model (ML-based methods) or work with huge computational resources (alchemical methods). Thus, compromised methods with good physical characteristics and high computational efficiency are expected. Therefore, here, we conducted a comprehensive investigation on the mutation issues of biological systems with the famous end-point binding free energy calculation methods represented by MM/GBSA and MM/PBSA. Different computational strategies considering different length of MD simulations, different value of dielectric constants and whether to incorporate entropy effects to the predicted total binding affinities were investigated to provide a more accurate way for predicting the energetic change upon protein mutations. Overall, our result shows that a relatively long MD simulation (e.g. 100 ns) benefits the prediction accuracy for both MM/GBSA and MM/PBSA (with the best Pearson correlation coefficient between the predicted ∆∆G and the experimental data of ~ 0.44 for a challenging dataset). Further analyses shows that systems involving large perturbations (e.g. multiple mutations and large number of atoms change in the mutation site) are much easier to be accurately predicted since the algorithm works more sensitively to the large change of the systems. Besides, system-specific investigation reveals that conformational adjustment is needed to refine the micro-environment of the manually mutated systems and thus lead one to understand why longer MD simulation is necessary to improve the predicting result. The proposed strategy is expected to be applied in large-scale mutation effects investigation with interpretation.
蛋白质突变在生物系统中频繁发生,这可能会通过破坏关键氢键、改变疏水相互作用等方式影响药物与其靶点的结合。因此,准确预测突变对生物系统的影响在各个领域都备受关注。不幸的是,由于实验时间和资金成本过高,目前仍无法进行大规模的湿实验室突变实验。相比之下,计算机模拟计算可以作为指导实验的先驱。事实上,为了准确预测突变效应,已经开展了许多开创性的工作,从计算成本较低的机器学习(ML)方法到成本较高的炼金术方法。然而,这些方法通常要么无法得到一个物理上可理解的模型(基于ML的方法),要么需要巨大的计算资源(炼金术方法)。因此,人们期望有兼具良好物理特性和高计算效率的折衷方法。所以,在这里,我们使用以MM/GBSA和MM/PBSA为代表的著名终点结合自由能计算方法,对生物系统的突变问题进行了全面研究。研究了不同的计算策略,包括不同长度的分子动力学(MD)模拟、不同的介电常数以及是否将熵效应纳入预测的总结合亲和力,以提供一种更准确的方法来预测蛋白质突变时的能量变化。总体而言,我们的结果表明,相对较长的MD模拟(例如100 ns)有利于MM/GBSA和MM/PBSA的预测准确性(对于一个具有挑战性的数据集,预测的∆∆G与实验数据之间的最佳皮尔逊相关系数约为0.44)。进一步分析表明,涉及大扰动的系统(例如多个突变和突变位点大量原子变化)更容易被准确预测,因为该算法对系统的大变化更敏感。此外,针对特定系统的研究表明,需要进行构象调整以优化人工突变系统的微环境,从而让人们理解为什么需要更长的MD模拟来改善预测结果。所提出的策略有望应用于大规模的具有解释性的突变效应研究。