School of Computer Science and Information Engineering, Zhejiang Gongshang University, Hangzhou, People's Republic of China.
Amino Acids. 2012 Jan;42(1):271-83. doi: 10.1007/s00726-010-0805-y. Epub 2010 Nov 17.
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.
蛋白质折叠通过两种状态(TS),没有可见的中间体,或通过至少一个中间体的多状态(MS)过程。我们通过引入一种称为 FOKIT 的新型基于序列的折叠类型预测器来分析决定折叠类型的序列衍生因素。该方法实现了一个逻辑回归模型,具有六个输入特征,这些特征混合了有关氨基酸组成和预测二级结构和溶剂可及性的信息。FOKIT 在四个基准数据集上的样本外测试中提供了平均马修斯相关系数(MCC)在 0.58 到 0.91 之间的预测结果,这些结果与四个现代预测器的结果具有竞争力或更好。我们还表明,在预测与用于构建模型的链具有低相似性的链时,FOKIT 优于这些方法,这在注释链数量有限的情况下是一个重要优势。我们证明了溶剂可及性的包含有助于区分折叠动力学类型,并且三个特征构成了区分 TS 和 MS 文件夹的统计学上显著标记。我们发现暴露的色氨酸和埋藏的亮氨酸含量增加表明是 MS 折叠,这意味着某些疏水性残基的暴露/埋藏可能在折叠中间体的形成中发挥重要作用。我们的结论得到了两个案例研究的支持。