通过非均匀分数组合提高蛋白质二级结构预测性能。

Improved performance in protein secondary structure prediction by inhomogeneous score combination.

作者信息

Guermeur Y, Geourjon C, Gallinari P, Deléage G

机构信息

LIP, Ecole Normale Supérieure de Lyon, 46, Allée d'Italie, 69364 Lyon cedex 07, France.

出版信息

Bioinformatics. 1999 May;15(5):413-21. doi: 10.1093/bioinformatics/15.5.413.

DOI:10.1093/bioinformatics/15.5.413

PMID:10366661

Abstract

MOTIVATION

In many fields of pattern recognition, combination has proved efficient to increase the generalization performance of individual prediction methods. Numerous systems have been developed for protein secondary structure prediction, based on different principles. Finding better ensemble methods for this task may thus become crucial. Furthermore, efforts need to be made to help the biologist in the post-processing of the outputs.

RESULTS

An ensemble method has been designed to post-process the outputs of discriminant models, in order to obtain an improvement in prediction accuracy while generating class posterior probability estimates. Experimental results establish that it can increase the recognition rate of protein secondary structure prediction methods that provide inhomogeneous scores, even though their individual prediction successes are largely different. This combination thus constitutes a help for the biologist, who can use it confidently on top of any set of prediction methods. Moreover, the resulting estimates can be used in various ways, for instance to determine which areas in the sequence are predicted with a given level of reliability.

AVAILABILITY

The prediction is freely available over the Internet on the Network Protein Sequence Analysis (NPS@) WWW server at http://pbil.ibcp.fr/NPSA/npsa_server.ht ml. The source code of the combiner can be obtained on request for academic use.

摘要

动机

在模式识别的许多领域，组合已被证明能有效提高单个预测方法的泛化性能。基于不同原理，已经开发了许多用于蛋白质二级结构预测的系统。因此，为这项任务找到更好的集成方法可能变得至关重要。此外，需要努力帮助生物学家对输出结果进行后处理。

结果

设计了一种集成方法对判别模型的输出进行后处理，以便在生成类后验概率估计的同时提高预测准确性。实验结果表明，即使单个预测成功率差异很大，该方法也能提高提供不均匀分数的蛋白质二级结构预测方法的识别率。这种组合因此对生物学家有所帮助，他们可以在任何一组预测方法之上放心使用。此外，所得估计可用于多种方式，例如确定序列中哪些区域是以给定的可靠性水平进行预测的。