ESAT-STADIUS, KU Leuven, Leuven, Belgium.
SWITCH Lab, KU Leuven, Leuven, Belgium.
PLoS Comput Biol. 2020 Apr 30;16(4):e1007722. doi: 10.1371/journal.pcbi.1007722. eCollection 2020 Apr.
Protein solubility is a key aspect for many biotechnological, biomedical and industrial processes, such as the production of active proteins and antibodies. In addition, understanding the molecular determinants of the solubility of proteins may be crucial to shed light on the molecular mechanisms of diseases caused by aggregation processes such as amyloidosis. Here we present SKADE, a novel Neural Network protein solubility predictor and we show how it can provide novel insight into the protein solubility mechanisms, thanks to its neural attention architecture. First, we show that SKADE positively compares with state of the art tools while using just the protein sequence as input. Then, thanks to the neural attention mechanism, we use SKADE to investigate the patterns learned during training and we analyse its decision process. We use this peculiarity to show that, while the attention profiles do not correlate with obvious sequence aspects such as biophysical properties of the aminoacids, they suggest that N- and C-termini are the most relevant regions for solubility prediction and are predictive for complex emergent properties such as aggregation-prone regions involved in beta-amyloidosis and contact density. Moreover, SKADE is able to identify mutations that increase or decrease the overall solubility of the protein, allowing it to be used to perform large scale in-silico mutagenesis of proteins in order to maximize their solubility.
蛋白质溶解性是许多生物技术、生物医学和工业过程的关键方面,例如活性蛋白和抗体的生产。此外,了解蛋白质溶解性的分子决定因素对于阐明由聚集过程(如淀粉样变性)引起的疾病的分子机制可能至关重要。在这里,我们介绍了 SKADE,一种新型神经网络蛋白质溶解性预测器,并且展示了由于其神经注意力架构,它如何为蛋白质溶解性机制提供新的见解。首先,我们表明,在仅使用蛋白质序列作为输入的情况下,SKADE 与最先进的工具相比具有积极的优势。然后,借助神经注意力机制,我们使用 SKADE 来研究训练过程中学习到的模式,并分析其决策过程。我们利用这一特点表明,尽管注意力分布与氨基酸的生物物理性质等明显的序列方面没有相关性,但它们表明 N-和 C-末端是对溶解性预测最相关的区域,并且对β淀粉样变性和接触密度等复杂的新兴特性具有预测能力。此外,SKADE 能够识别增加或降低蛋白质整体溶解性的突变,从而可以用于对蛋白质进行大规模的计算机模拟诱变,以最大限度地提高其溶解性。