Melén Karin, Krogh Anders, von Heijne Gunnar
Department of Biochemistry and Biophysics, Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden.
J Mol Biol. 2003 Mar 28;327(3):735-44. doi: 10.1016/s0022-2836(03)00182-7.
We have developed reliability scores for five widely used membrane protein topology prediction methods, and have applied them both on a test set of 92 bacterial plasma membrane proteins with experimentally determined topologies and on all predicted helix bundle membrane proteins in three fully sequenced genomes: Escherichia coli, Saccharomyces cerevisiae and Caenorhabditis elegans. We show that the reliability scores work well for the TMHMM and MEMSAT methods, and that they allow the probability that the predicted topology is correct to be estimated for any protein. We further show that the available test set is biased towards high-scoring proteins when compared to the genome-wide data sets, and provide estimates for the expected prediction accuracy of TMHMM across the three genomes. Finally, we show that the performance of TMHMM is considerably better when limited experimental information (such as the in/out location of a protein's C terminus) is available, and estimate that at least ten percentage points in overall accuracy in whole-genome predictions can be gained in this way.
我们为五种广泛使用的膜蛋白拓扑结构预测方法开发了可靠性评分,并将其应用于一组92个具有实验确定拓扑结构的细菌质膜蛋白测试集以及三个全序列基因组(大肠杆菌、酿酒酵母和秀丽隐杆线虫)中所有预测的螺旋束膜蛋白上。我们表明,可靠性评分对TMHMM和MEMSAT方法效果良好,并且它们可以估计任何蛋白质预测拓扑结构正确的概率。我们进一步表明,与全基因组数据集相比,现有的测试集偏向于高分蛋白质,并提供了TMHMM在这三个基因组上预期预测准确性的估计值。最后,我们表明,当有有限的实验信息(如蛋白质C末端的进/出位置)时,TMHMM的性能会显著更好,并估计通过这种方式可以在全基因组预测的总体准确性上提高至少十个百分点。