Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States.
Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, Hubei 430079, China.
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae544.
Mutations in protein-protein interactions can affect the corresponding complexes, impacting function and potentially leading to disease. Given the abundance of membrane proteins, it is crucial to assess the impact of mutations on the binding affinity of these proteins. Although several methods exist to predict the binding free energy change due to mutations in protein-protein complexes, most require structural information of the protein complex and are primarily trained on the SKEMPI database, which is composed mainly of soluble proteins.
A novel sequence-based method (SAAMBE-MEM) for predicting binding free energy changes (ΔΔG) in membrane protein-protein complexes due to mutations has been developed. This method utilized the MPAD database, which contains binding affinities for wild-type and mutant membrane protein complexes. A machine learning model was developed to predict ΔΔG by leveraging features such as amino acid indices and position-specific scoring matrices (PSSM). Through extensive dataset curation and feature extraction, SAAMBE-MEM was trained and validated using the XGBoost regression algorithm. The optimal feature set, including PSSM-related features, achieved a Pearson correlation coefficient of 0.64, outperforming existing methods trained on the SKEMPI database. Furthermore, it was demonstrated that SAAMBE-MEM performs much better when utilizing evolution-based features in contrast to physicochemical features.
The method is accessible via a web server and standalone code at http://compbio.clemson.edu/SAAMBE-MEM/. The cleaned MPAD database is available at the website.
蛋白质-蛋白质相互作用中的突变会影响相应的复合物,从而影响功能,并可能导致疾病。鉴于膜蛋白的丰富性,评估突变对这些蛋白质结合亲和力的影响至关重要。虽然有几种方法可以预测蛋白质-蛋白质复合物中突变引起的结合自由能变化,但大多数方法都需要蛋白质复合物的结构信息,并且主要在 SKEMPI 数据库上进行训练,该数据库主要由可溶性蛋白质组成。
开发了一种新的基于序列的方法(SAAMBE-MEM),用于预测膜蛋白-蛋白复合物中突变引起的结合自由能变化(ΔΔG)。该方法利用了包含野生型和突变型膜蛋白复合物结合亲和力的 MPAD 数据库。通过利用氨基酸指数和位置特异性评分矩阵(PSSM)等特征,开发了一种机器学习模型来预测ΔΔG。通过对数据集的广泛整理和特征提取,使用 XGBoost 回归算法对 SAAMBE-MEM 进行了训练和验证。最优特征集包括与 PSSM 相关的特征,其 Pearson 相关系数达到 0.64,优于在 SKEMPI 数据库上训练的现有方法。此外,与物理化学特征相比,利用基于进化的特征时,SAAMBE-MEM 的表现要好得多。
该方法可通过网络服务器和 http://compbio.clemson.edu/SAAMBE-MEM/ 上的独立代码访问。已清理的 MPAD 数据库可在该网站上获得。