序列保守模式和经验尺度与膜蛋白跨膜螺旋的暴露模式之间的关联有多强？

Park Yungki, Helms Volkhard

Center for Bioinformatics, Saarland University, 66041 Saarbruecken, Germany.

Biopolymers. 2006 Nov;83(4):389-99. doi: 10.1002/bip.20569.

Given the difficulty in determining high-resolution structures of helical membrane proteins, sequence-based prediction methods can be useful in elucidating diverse physiological processes mediated by this important class of proteins. Predicting the angular orientations of transmembrane (TM) helices about the helix axes, based on the helix parameters from electron microscopy data, is a classical problem in this regard. This problem has triggered the development of a number of different empirical scales. Recently, sequence conservation patterns were also made use of for improved predictions. Empirical scales and sequence conservation patterns (collectively termed as "prediction scales") have also found frequent applications in other research areas of membrane proteins: for example, in structure modeling and in prediction of buried TM helices. This trend is expected to grow in the near future unless there are revolutionary developments in the experimental characterization of membrane proteins. Thus, it is timely and imperative to carry out a comprehensive benchmark test over the prediction scales proposed so far to determine their pros and cons. In the current analysis, we use exposure patterns of TM helices as a golden standard, because if one develops a prediction scale that correlates perfectly with exposure patterns of TM helices, it will enable one to predict buried residues (or buried faces) of TM helices with an accuracy of 100%. Our analysis reveals several important points. (1) It demonstrates that sequence conservation patterns are much more strongly correlated with exposure patterns of TM helices than empirical scales. (2) Scales that were specifically parameterized using structure data (structure-based scales) display stronger correlation than hydrophobicity-based scales, as expected. (3) A nonnegligible difference is observed among the structure-based scales in their correlational property, suggesting that not every learning algorithm is equally effective. (4) A straightforward framework of optimally combining sequence conservation patterns and empirical scales is proposed, which reveals that improvements gained from combining the two sources of information are not dramatic in almost all cases. In turn, this calls for the development of fundamentally different scales that capture the essentials of membrane protein folding for substantial improvements.

鉴于确定螺旋膜蛋白的高分辨率结构存在困难，基于序列的预测方法对于阐明这类重要蛋白质介导的各种生理过程可能会有所帮助。基于电子显微镜数据的螺旋参数预测跨膜（TM）螺旋围绕螺旋轴的角取向是这方面的一个经典问题。这个问题引发了许多不同经验尺度的发展。最近，序列保守模式也被用于改进预测。经验尺度和序列保守模式（统称为“预测尺度”）在膜蛋白的其他研究领域也有频繁应用：例如，在结构建模和预测埋藏的TM螺旋方面。除非膜蛋白的实验表征有革命性进展，否则这种趋势预计在不久的将来还会增加。因此，对迄今为止提出的预测尺度进行全面的基准测试以确定其优缺点是及时且必要的。在当前的分析中，我们将TM螺旋的暴露模式用作黄金标准，因为如果开发出一种与TM螺旋的暴露模式完美相关的预测尺度，就能够以100%的准确率预测TM螺旋的埋藏残基（或埋藏面）。我们的分析揭示了几个要点。（1）结果表明，序列保守模式与TM螺旋的暴露模式的相关性比经验尺度更强。（2）如预期的那样，使用结构数据专门参数化的尺度（基于结构的尺度）比基于疏水性的尺度显示出更强的相关性。（3）在基于结构的尺度的相关性方面观察到不可忽略的差异，这表明并非每种学习算法都同样有效。（4）提出了一个将序列保守模式和经验尺度进行最优组合的简单框架，结果表明在几乎所有情况下，将这两种信息源组合所获得的改进并不显著。相应地，这就需要开发从根本上不同的尺度来捕捉膜蛋白折叠的本质以实现实质性的改进。