Stockholm Bioinformatics Center, Center for Biomembrane Research, Department of Biochemistry and Biophysics, Science for Life Laboratory, Swedish E-science Research Center, Stockholm University, Stockholm, Sweden.
Proteomics. 2012 Aug;12(14):2282-94. doi: 10.1002/pmic.201100495.
For current state-of-the-art methods, the prediction of correct topology of membrane proteins has been reported to be above 80%. However, this performance has only been observed in small and possibly biased data sets obtained from protein structures or biochemical assays. Here, we test a number of topology predictors on an "unseen" set of proteins of known structure and also on four "genome-scale" data sets, including one recent large set of experimentally validated human membrane proteins with glycosylated sites. The set of glycosylated proteins is also used to examine the ability of prediction methods to separate membrane from nonmembrane proteins. The results show that methods utilizing multiple sequence alignments are overall superior to methods that do not. The best performance is obtained by TOPCONS, a consensus method that combines several of the other prediction methods. The best methods to distinguish membrane from nonmembrane proteins belong to the "Phobius" group of predictors. We further observe that the reported high accuracies in the smaller benchmark sets are not quite maintained in larger scale benchmarks. Instead, we estimate the performance of the best prediction methods for eukaryotic membrane proteins to be between 60% and 70%. The low agreement between predictions from different methods questions earlier estimates about the global properties of the membrane proteome. Finally, we suggest a pipeline to estimate these properties using a combination of the best predictors that could be applied in large-scale proteomics studies of membrane proteins.
对于当前最先进的方法,据报道,预测膜蛋白的正确拓扑结构的准确率已经超过 80%。然而,这种性能仅在从蛋白质结构或生化测定获得的小且可能存在偏差的数据集中观察到。在这里,我们在一组已知结构的“未见”蛋白质集以及四个“基因组规模”数据集上测试了许多拓扑预测器,其中包括最近一组具有糖基化位点的经过实验验证的人类膜蛋白的大型数据集。糖基化蛋白集也用于检验预测方法区分膜蛋白和非膜蛋白的能力。结果表明,利用多个序列比对的方法总体上优于不使用这种方法的方法。TOPCONS(一种结合了其他几种预测方法的共识方法)的性能最佳。区分膜蛋白和非膜蛋白的最佳方法属于“Phobius”预测器组。我们还观察到,在较小的基准集中报告的高精度在较大规模的基准中并没有得到很好的保持。相反,我们估计最好的预测方法对于真核膜蛋白的性能在 60%到 70%之间。不同方法的预测之间的低一致性质疑了早期对膜蛋白质组全局特性的估计。最后,我们建议使用最佳预测器的组合来估计这些特性,该组合可以应用于膜蛋白的大规模蛋白质组学研究中。