Lundström J, Rychlewski L, Bujnicki J, Elofsson A
Stockholm Bioinformatics Center, Stockholm University, SE 10691 Stockholm, Sweden.
Protein Sci. 2001 Nov;10(11):2354-62. doi: 10.1110/ps.08501.
During recent years many protein fold recognition methods have been developed, based on different algorithms and using various kinds of information. To examine the performance of these methods several evaluation experiments have been conducted. These include blind tests in CASP/CAFASP, large scale benchmarks, and long-term, continuous assessment with newly solved protein structures. These studies confirm the expectation that for different targets different methods produce the best predictions, and the final prediction accuracy could be improved if the available methods were combined in a perfect manner. In this article a neural-network-based consensus predictor, Pcons, is presented that attempts this task. Pcons attempts to select the best model out of those produced by six prediction servers, each using different methods. Pcons translates the confidence scores reported by each server into uniformly scaled values corresponding to the expected accuracy of each model. The translated scores as well as the similarity between models produced by different servers is used in the final selection. According to the analysis based on two unrelated sets of newly solved proteins, Pcons outperforms any single server by generating approximately 8%-10% more correct predictions. Furthermore, the specificity of Pcons is significantly higher than for any individual server. From analyzing different input data to Pcons it can be shown that the improvement is mainly attributable to measurement of the similarity between the different models. Pcons is freely accessible for the academic community through the protein structure-prediction metaserver at http://bioinfo.pl/meta/.
近年来,基于不同算法并利用各种信息,已经开发出了许多蛋白质折叠识别方法。为了检验这些方法的性能,已经进行了若干评估实验。这些实验包括在CASP/CAFASP中的盲测、大规模基准测试以及利用新解析的蛋白质结构进行的长期连续评估。这些研究证实了这样的预期:对于不同的目标,不同的方法能产生最佳预测结果,并且如果以理想的方式组合现有方法,最终的预测准确性可以得到提高。在本文中,提出了一种基于神经网络的一致性预测器Pcons来尝试这项任务。Pcons试图从六个预测服务器所产生的模型中选出最佳模型,每个服务器都使用不同的方法。Pcons将每个服务器报告的置信度分数转换为与每个模型的预期准确性相对应的统一缩放值。最终选择时会使用转换后的分数以及不同服务器产生的模型之间的相似性。根据基于两组不相关的新解析蛋白质的分析,Pcons通过产生比任何单个服务器多约8%-10%的正确预测而表现更优。此外,Pcons的特异性明显高于任何单个服务器。通过分析输入到Pcons的不同数据可以表明,这种改进主要归因于对不同模型之间相似性的度量。学术界可以通过http://bioinfo.pl/meta/的蛋白质结构预测元服务器免费使用Pcons。