Attanasio Simone, Kwasigroch Jean, Rooman Marianne, Pucci Fabrizio
Computational Biology and Bioinformatics, Université Libre de Bruxelles, Ixelles, 1050, Belgium.
Interuniversity Institute of Bioinformatics in Brussels, Brussels, 1050, Belgium.
Sci Rep. 2025 Jul 29;15(1):27531. doi: 10.1038/s41598-025-11326-x.
Protein solubility problems arise in a wide range of applications, from antibody development to enzyme production, and are linked to several major disorders, including cataracts and Alzheimer's diseases. To assist scientists in designing proteins with improved solubility and better understand solubility-related diseases, we introduce SOuLMuSiC, a computational tool for the fast and accurate prediction of the impact of single-site mutations on protein solubility. Our model is based on a simple artificial neural network that takes as input a series of features, including biophysical properties of wild-type and mutated residues, energetic values computed using various statistical potentials, and mutational scores derived from protein language models. SOuLMuSiC has been trained on a curated dataset of about 700 single-site mutations with known solubility values, collected and manually verified from original literature. It significantly outperforms current state-of-the-art predictors in strict cross validation: the Spearman correlation reaches 0.5 when solubility changes are represented categorically; for the subset with quantitative values, it increases to 0.7. SOuLMuSiC also shows good performance on external datasets containing high-throughput enzyme solubility-related data as well as protein aggregation propensities. In summary, SOuLMuSiC is a valuable tool for identifying mutations that impact protein solubility, and can play a major role in the rational design of proteins with improved solubility and in understanding genetic variants' effect. It is freely available for academic use at http://babylone.ulb.ac.be/SoulMuSiC/.
蛋白质溶解度问题出现在从抗体开发到酶生产的广泛应用中,并与包括白内障和阿尔茨海默病在内的几种主要疾病相关。为了帮助科学家设计具有更高溶解度的蛋白质,并更好地理解与溶解度相关的疾病,我们引入了SOuLMuSiC,这是一种用于快速准确预测单点突变对蛋白质溶解度影响的计算工具。我们的模型基于一个简单的人工神经网络,该网络将一系列特征作为输入,包括野生型和突变残基的生物物理性质、使用各种统计势计算的能量值,以及从蛋白质语言模型得出的突变分数。SOuLMuSiC在一个经过整理的数据集上进行了训练,该数据集包含约700个具有已知溶解度值的单点突变,这些突变是从原始文献中收集并人工验证的。在严格的交叉验证中,它显著优于当前最先进的预测器:当溶解度变化以分类方式表示时,斯皮尔曼相关性达到0.5;对于具有定量值的子集,相关性增加到0.7。SOuLMuSiC在包含高通量酶溶解度相关数据以及蛋白质聚集倾向的外部数据集上也表现出良好的性能。总之,SOuLMuSiC是一种用于识别影响蛋白质溶解度的突变的有价值工具,并且可以在合理设计具有更高溶解度的蛋白质以及理解遗传变异的影响方面发挥重要作用。它可在http://babylone.ulb.ac.be/SoulMuSiC/免费用于学术用途。