Liu Yan, Carbonell Jaime, Klein-Seetharaman Judith, Gopalakrishnan Vanathi
Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA15213, USA.
Bioinformatics. 2004 Nov 22;20(17):3099-107. doi: 10.1093/bioinformatics/bth370. Epub 2004 Jun 24.
Protein secondary structure prediction is an important step towards understanding how proteins fold in three dimensions. Recent analysis by information theory indicates that the correlation between neighboring secondary structures are much stronger than that of neighboring amino acids. In this article, we focus on the combination problem for sequences, i.e. combining the scores or assignments from single or multiple prediction systems under the constraint of a whole sequence, as a target for improvement in protein secondary structure prediction.
We apply several graphical chain models to solve the combination problem and show that they are consistently more effective than the traditional window-based methods. In particular, conditional random fields (CRFs) moderately improve the predictions for helices and, more importantly, for beta sheets, which are the major bottleneck for protein secondary structure prediction.
蛋白质二级结构预测是理解蛋白质如何折叠成三维结构的重要一步。最近基于信息论的分析表明,相邻二级结构之间的相关性比相邻氨基酸之间的相关性要强得多。在本文中,我们将重点关注序列的组合问题,即在整个序列的约束下,将来自单个或多个预测系统的分数或分配结果进行组合,以此作为改进蛋白质二级结构预测的目标。
我们应用了几种图形链模型来解决组合问题,并表明它们始终比传统的基于窗口的方法更有效。特别是,条件随机场(CRF)适度地改进了对螺旋结构的预测,更重要的是,改进了对β折叠的预测,而β折叠是蛋白质二级结构预测的主要瓶颈。