Chen Chi-Wei, Chang Kai-Po, Ho Cheng-Wei, Chang Hsung-Pin, Chu Yen-Wei
Department of Computer Science and Engineering, National Chung Hsing University, Kuo Kuang Rd., Taichung 402, Taiwan.
Institute of Genomics and Bioinformatics, National Chung Hsing University, Kuo Kuang Rd., Taichung 402, Taiwan.
Entropy (Basel). 2018 Dec 19;20(12):988. doi: 10.3390/e20120988.
Thermostability is a protein property that impacts many types of studies, including protein activity enhancement, protein structure determination, and drug development. However, most computational tools designed to predict protein thermostability require tertiary structure data as input. The few tools that are dependent only on the primary structure of a protein to predict its thermostability have one or more of the following problems: a slow execution speed, an inability to make large-scale mutation predictions, and the absence of temperature and pH as input parameters. Therefore, we developed a computational tool, named KStable, that is sequence-based, computationally rapid, and includes temperature and pH values to predict changes in the thermostability of a protein upon the introduction of a mutation at a single site. KStable was trained using basis features and minimal redundancy-maximal relevance (mRMR) features, and 58 classifiers were subsequently tested. To find the representative features, a regular-mRMR method was developed. When KStable was evaluated with an independent test set, it achieved an accuracy of 0.708.
热稳定性是一种蛋白质特性,会影响多种类型的研究,包括蛋白质活性增强、蛋白质结构测定和药物开发。然而,大多数用于预测蛋白质热稳定性的计算工具都需要三级结构数据作为输入。少数仅依赖蛋白质一级结构来预测其热稳定性的工具存在以下一个或多个问题:执行速度慢、无法进行大规模突变预测以及缺少温度和pH作为输入参数。因此,我们开发了一种名为KStable的计算工具,它基于序列,计算速度快,并且包括温度和pH值,以预测在蛋白质单个位点引入突变时其热稳定性的变化。KStable使用基础特征和最小冗余-最大相关性(mRMR)特征进行训练,随后测试了58个分类器。为了找到代表性特征,开发了一种正则化mRMR方法。当使用独立测试集对KStable进行评估时,其准确率达到了0.708。