Guermeur Y, Geourjon C, Gallinari P, Deléage G
LIP, Ecole Normale Supérieure de Lyon, 46, Allée d'Italie, 69364 Lyon cedex 07, France.
Bioinformatics. 1999 May;15(5):413-21. doi: 10.1093/bioinformatics/15.5.413.
In many fields of pattern recognition, combination has proved efficient to increase the generalization performance of individual prediction methods. Numerous systems have been developed for protein secondary structure prediction, based on different principles. Finding better ensemble methods for this task may thus become crucial. Furthermore, efforts need to be made to help the biologist in the post-processing of the outputs.
An ensemble method has been designed to post-process the outputs of discriminant models, in order to obtain an improvement in prediction accuracy while generating class posterior probability estimates. Experimental results establish that it can increase the recognition rate of protein secondary structure prediction methods that provide inhomogeneous scores, even though their individual prediction successes are largely different. This combination thus constitutes a help for the biologist, who can use it confidently on top of any set of prediction methods. Moreover, the resulting estimates can be used in various ways, for instance to determine which areas in the sequence are predicted with a given level of reliability.
The prediction is freely available over the Internet on the Network Protein Sequence Analysis (NPS@) WWW server at http://pbil.ibcp.fr/NPSA/npsa_server.ht ml. The source code of the combiner can be obtained on request for academic use.
在模式识别的许多领域,组合已被证明能有效提高单个预测方法的泛化性能。基于不同原理,已经开发了许多用于蛋白质二级结构预测的系统。因此,为这项任务找到更好的集成方法可能变得至关重要。此外,需要努力帮助生物学家对输出结果进行后处理。
设计了一种集成方法对判别模型的输出进行后处理,以便在生成类后验概率估计的同时提高预测准确性。实验结果表明,即使单个预测成功率差异很大,该方法也能提高提供不均匀分数的蛋白质二级结构预测方法的识别率。这种组合因此对生物学家有所帮助,他们可以在任何一组预测方法之上放心使用。此外,所得估计可用于多种方式,例如确定序列中哪些区域是以给定的可靠性水平进行预测的。
该预测可通过互联网在网络蛋白质序列分析(NPS@)的万维网服务器上免费获取,网址为http://pbil.ibcp.fr/NPSA/npsa_server.html。合并器的源代码可应学术使用请求获取。