Yang Yang, Chen Mengqi, Liu Congrui, Vihinen Mauno
Computing Science and Artificial Intelligence College, Suzhou City University, Suzhou 215004, China.
School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
Int J Mol Sci. 2025 Jun 11;26(12):5604. doi: 10.3390/ijms26125604.
When globular proteins fold into their characteristic three-dimensional structures, some amino acids are located on the surface, while others are situated in the protein core, where they cannot interact with molecules in the environment. Predicting the degree of solubility of amino acids provides insight into the function and relevance of residues. Residue accessibility is crucial for several protein functions, including enzymatic activity, allostery, multimer formation, binding to other molecules, and immunogenicity. We developed a novel sequence-based predictor for amino acid accessibility with features derived from three-dimensional protein structures. Several machine learning algorithms were tested, and the long short-term memory (LSTM) deep learning method demonstrated the best performance; thus, it was utilized to develop the freely available SolAcc tool. It showed superior performance compared to state-of-the-art predictors in a blind test.
当球状蛋白质折叠成其特有的三维结构时,一些氨基酸位于表面,而另一些则位于蛋白质核心,在那里它们无法与环境中的分子相互作用。预测氨基酸的溶解度程度有助于深入了解残基的功能和相关性。残基可及性对于多种蛋白质功能至关重要,包括酶活性、别构效应、多聚体形成、与其他分子的结合以及免疫原性。我们开发了一种基于序列的新型氨基酸可及性预测器,其特征源自三维蛋白质结构。测试了几种机器学习算法,长短期记忆(LSTM)深度学习方法表现最佳;因此,利用该方法开发了免费可用的SolAcc工具。在盲测中,它比现有最佳预测器表现更优。