Chen Chien Peter, Kernytsky Andrew, Rost Burkhard
Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA.
Protein Sci. 2002 Dec;11(12):2774-91. doi: 10.1110/ps.0214502.
Methods that predict membrane helices have become increasingly useful in the context of analyzing entire proteomes, as well as in everyday sequence analysis. Here, we analyzed 27 advanced and simple methods in detail. To resolve contradictions in previous works and to reevaluate transmembrane helix prediction algorithms, we introduced an analysis that distinguished between performance on redundancy-reduced high- and low-resolution data sets, established thresholds for significant differences in performance, and implemented both per-segment and per-residue analysis of membrane helix predictions. Although some of the advanced methods performed better than others, we showed in a thorough bootstrapping experiment based on various measures of accuracy that no method performed consistently best. In contrast, most simple hydrophobicity scale-based methods were significantly less accurate than any advanced method as they overpredicted membrane helices and confused membrane helices with hydrophobic regions outside of membranes. In contrast, the advanced methods usually distinguished correctly between membrane-helical and other proteins. Nonetheless, few methods reliably distinguished between signal peptides and membrane helices. We could not verify a significant difference in performance between eukaryotic and prokaryotic proteins. Surprisingly, we found that proteins with more than five helices were predicted at a significantly lower accuracy than proteins with five or fewer. The important implication is that structurally unsolved multispanning membrane proteins, which are often important drug targets, will remain problematic for transmembrane helix prediction algorithms. Overall, by establishing a standardized methodology for transmembrane helix prediction evaluation, we have resolved differences among previous works and presented novel trends that may impact the analysis of entire proteomes.
在分析整个蛋白质组以及日常序列分析中,预测膜螺旋的方法变得越来越有用。在此,我们详细分析了27种先进和简单的方法。为了解决先前研究中的矛盾并重新评估跨膜螺旋预测算法,我们引入了一种分析方法,该方法区分了在冗余减少的高分辨率和低分辨率数据集上的性能,建立了性能显著差异的阈值,并对膜螺旋预测进行了逐段和逐残基分析。尽管一些先进方法比其他方法表现更好,但我们在基于各种准确性度量的全面自展实验中表明,没有一种方法始终表现最佳。相比之下,大多数基于简单疏水性标度的方法明显不如任何先进方法准确,因为它们过度预测了膜螺旋,并将膜螺旋与膜外的疏水区域混淆。相比之下,先进方法通常能正确区分膜螺旋蛋白和其他蛋白。然而,很少有方法能可靠地区分信号肽和膜螺旋。我们无法证实真核生物和原核生物蛋白质在性能上存在显著差异。令人惊讶的是,我们发现具有超过五个螺旋的蛋白质的预测准确性明显低于具有五个或更少螺旋的蛋白质。一个重要的含义是,结构未解决的多跨膜蛋白(它们通常是重要的药物靶点)对于跨膜螺旋预测算法来说仍然是个难题。总体而言,通过建立跨膜螺旋预测评估的标准化方法,我们解决了先前研究之间的差异,并呈现了可能影响整个蛋白质组分析的新趋势。