Chen Chi-Wei, Lin Meng-Han, Liao Chi-Chou, Chang Hsung-Pin, Chu Yen-Wei
Department of Computer Science and Engineering, National Chung-Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan.
Institute of Genomics and Bioinformatics, National Chung Hsing University, 145 Xingda Rd., South Dist., Taichung City 402, Taiwan.
Comput Struct Biotechnol J. 2020 Mar 6;18:622-630. doi: 10.1016/j.csbj.2020.02.021. eCollection 2020.
Protein mutations can lead to structural changes that affect protein function and result in disease occurrence. In protein engineering, drug design or and optimization industries, mutations are often used to improve protein stability or to change protein properties while maintaining stability. To provide possible candidates for novel protein design, several computational tools for predicting protein stability changes have been developed. Although many prediction tools are available, each tool employs different algorithms and features. This can produce conflicting prediction results that make it difficult for users to decide upon the correct protein design. Therefore, this study proposes an integrated prediction tool, iStable 2.0, which integrates 11 sequence-based and structure-based prediction tools by machine learning and adds protein sequence information as features. Three coding modules are designed for the system, an Online Server Module, a Stand-alone Module and a Sequence Coding Module, to improve the prediction performance of the previous version of the system. The final integrated structure-based classification model has a higher Matthews correlation coefficient than that of the single prediction tool (0.708 vs 0.547, respectively), and the Pearson correlation coefficient of the regression model likewise improves from 0.669 to 0.714. The sequence-based model not only successfully integrates off-the-shelf predictors but also improves the Matthews correlation coefficient of the best single prediction tool by at least 0.161, which is better than the individual structure-based prediction tools. In addition, both the Sequence Coding Module and the Stand-alone Module maintain performance with only a 5% decrease of the Matthews correlation coefficient when the integrated online tools are unavailable. iStable 2.0 is available at http://ncblab.nchu.edu.tw/iStable2.
蛋白质突变可导致结构变化,进而影响蛋白质功能并引发疾病。在蛋白质工程、药物设计及优化行业中,突变常被用于提高蛋白质稳定性或改变蛋白质特性同时保持稳定性。为了提供新型蛋白质设计的可能候选方案,已开发了多种用于预测蛋白质稳定性变化的计算工具。尽管有许多预测工具可供使用,但每个工具采用不同的算法和特征。这可能产生相互矛盾的预测结果,使用户难以决定正确的蛋白质设计。因此,本研究提出了一种集成预测工具iStable 2.0,它通过机器学习集成了11种基于序列和基于结构的预测工具,并将蛋白质序列信息作为特征添加进来。为该系统设计了三个编码模块,即在线服务器模块、独立模块和序列编码模块,以提高系统先前版本的预测性能。最终的基于结构的集成分类模型的马修斯相关系数高于单个预测工具(分别为0.708对0.547),回归模型的皮尔逊相关系数同样从0.669提高到0.714。基于序列的模型不仅成功集成了现成的预测器,还将最佳单个预测工具的马修斯相关系数至少提高了0.161,优于基于结构的单个预测工具。此外,当集成的在线工具不可用时,序列编码模块和独立模块的性能仅下降5%,马修斯相关系数仍能保持。iStable 2.0可在http://ncblab.nchu.edu.tw/iStable2获取。