Kurgan Lukasz, Chen Ke
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada.
Biochem Biophys Res Commun. 2007 Jun 1;357(2):453-60. doi: 10.1016/j.bbrc.2007.03.164. Epub 2007 Apr 5.
Structural class characterizes the overall folding type of a protein or its domain. This paper develops an accurate method for in silico prediction of structural classes from low homology (twilight zone) protein sequences. The proposed LLSC-PRED method applies linear logistic regression classifier and a custom-designed, feature-based sequence representation to provide predictions. The main advantages of the LLSC-PRED are the comprehensive representation that includes 58 features describing composition and physicochemical properties of the sequences and transparency of the prediction model. The representation also includes predicted secondary structure content, thus for the first time exploring synergy between these two related predictions. Based on tests performed with a large set of 1673 twilight zone domains, the LLSC-PRED's prediction accuracy, which equals over 62%, is shown to be better than accuracy of over a dozen recently published competing in silico methods and similar to accuracy of other, non-transparent classifiers that use the proposed representation.
结构类别表征蛋白质或其结构域的整体折叠类型。本文开发了一种从低同源性(模糊区)蛋白质序列进行结构类别计算机预测的精确方法。所提出的LLSC - PRED方法应用线性逻辑回归分类器和定制设计的基于特征的序列表示来进行预测。LLSC - PRED的主要优点是包含58个描述序列组成和物理化学性质的特征的综合表示以及预测模型的透明度。该表示还包括预测的二级结构含量,从而首次探索这两个相关预测之间的协同作用。基于对1673个模糊区结构域的大量测试,LLSC - PRED的预测准确率超过62%,结果表明其优于最近发表的十几种竞争的计算机方法的准确率,并且与使用所提出表示的其他非透明分类器的准确率相似。