Ding Yong-Sheng, Zhang Tong-Liang, Gu Quan, Zhao Pei-Ying, Chou Kuo-Chen
College of Information Sciences and Technology, Donghua University, Shanghai, China.
Protein Pept Lett. 2009;16(5):552-60. doi: 10.2174/092986609788167833.
Prediction of protein secondary structure is somewhat reminiscent of the efforts by many previous investigators but yet still worthy of revisiting it owing to its importance in protein science. Several studies indicate that the knowledge of protein structural classes can provide useful information towards the determination of protein secondary structure. Particularly, the performance of prediction algorithms developed recently have been improved rapidly by incorporating homologous multiple sequences alignment information. Unfortunately, this kind of information is not available for a significant amount of proteins. In view of this, it is necessary to develop the method based on the query protein sequence alone, the so-called single-sequence method. Here, we propose a novel single-sequence approach which is featured by that various kinds of contextual information are taken into account, and that a maximum entropy model classifier is used as the prediction engine. As a demonstration, cross-validation tests have been performed by the new method on datasets containing proteins from different structural classes, and the results thus obtained are quite promising, indicating that the new method may become an useful tool in protein science or at least play a complementary role to the existing protein secondary structure prediction methods.
蛋白质二级结构预测在某种程度上让人想起许多先前研究者所做的努力,但由于其在蛋白质科学中的重要性,仍值得重新审视。多项研究表明,蛋白质结构类别的知识可为确定蛋白质二级结构提供有用信息。特别是,通过纳入同源多序列比对信息,最近开发的预测算法的性能得到了迅速提升。不幸的是,对于大量蛋白质而言,此类信息并不存在。鉴于此,有必要开发仅基于查询蛋白质序列的方法,即所谓的单序列方法。在此,我们提出一种新颖的单序列方法,其特点是考虑了各种上下文信息,并使用最大熵模型分类器作为预测引擎。作为演示,新方法已在包含来自不同结构类别的蛋白质的数据集上进行了交叉验证测试,所得结果颇具前景,表明新方法可能成为蛋白质科学中的一种有用工具,或者至少对现有的蛋白质二级结构预测方法起到补充作用。