Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA.
Int J Mol Sci. 2021 Jan 9;22(2):606. doi: 10.3390/ijms22020606.
Modeling the effect of mutations on protein thermodynamics stability is useful for protein engineering and understanding molecular mechanisms of disease-causing variants. Here, we report a new development of the SAAFEC method, the SAAFEC-SEQ, which is a gradient boosting decision tree machine learning method to predict the change of the folding free energy caused by amino acid substitutions. The method does not require the 3D structure of the corresponding protein, but only its sequence and, thus, can be applied on genome-scale investigations where structural information is very sparse. SAAFEC-SEQ uses physicochemical properties, sequence features, and evolutionary information features to make the predictions. It is shown to consistently outperform all existing state-of-the-art sequence-based methods in both the Pearson correlation coefficient and root-mean-squared-error parameters as benchmarked on several independent datasets. The SAAFEC-SEQ has been implemented into a web server and is available as stand-alone code that can be downloaded and embedded into other researchers' code.
建模突变对蛋白质热力学稳定性的影响对于蛋白质工程和理解致病变异的分子机制很有用。在这里,我们报告了 SAAFEC 方法的一个新进展,即 SAAFEC-SEQ,这是一种梯度提升决策树机器学习方法,用于预测氨基酸取代引起的折叠自由能变化。该方法不需要对应蛋白质的 3D 结构,而只需要其序列,因此可以应用于结构信息非常稀疏的基因组规模研究。SAAFEC-SEQ 使用物理化学性质、序列特征和进化信息特征进行预测。在几个独立的数据集上进行基准测试时,它在 Pearson 相关系数和均方根误差参数方面始终优于所有现有的基于序列的方法。SAAFEC-SEQ 已被实现为一个网络服务器,并作为独立的代码提供,可以下载并嵌入到其他研究人员的代码中。