Edelman J
Department of Physiology and Biophysics, University of California, Irvine 92717.
J Mol Biol. 1993 Jul 5;232(1):165-91. doi: 10.1006/jmbi.1993.1375.
Sliding-window averaging of amino acid properties is often used for predicting protein secondary structure. Such a scheme (linear convolutional recognizer, LCR) assigns a number (weight) to each type of monomer, and then convolutes a window function with the sequence of weights to yield a decision function. Features, regions having the property of interest, are predicted to occur where the decision function exceeds some threshold. A general method for approximating the best possible window and weights is presented. The needed data are the sequences of some chains and the locations of their features. The method is applied to transmembrane helices (TMH) of membrane proteins. Optimal weights and windows are calculated, using bacteriorhodopsin and photosynthetic reaction centers as the reference chains. The predictor is then tested on other proteins. No TMH are predicted in porin, whose transmembrane segments are beta-sheets. This shows that the predictor is specific for helical segments. Few segments are predicted for non-membrane globular proteins. The predictor thus correctly rejects their hydrophobic helices. Finally, the predictor is tested with some membrane proteins whose transmembrane topology is partially known. Among their TMH, the LCR is unable to resolve 6% which are closely spaced. Taking 17 as the minimum allowed length of a predicted TMH, 4% of the known ones are missed and 6% of the predicted ones are false. For a minimum length of 10, 0.5% are missed and 14% are false. The mean magnitude of the endpoint error is about four residues. Alternative prediction methods make more errors.
氨基酸特性的滑动窗口平均法常用于预测蛋白质二级结构。这种方案(线性卷积识别器,LCR)为每种单体类型赋予一个数字(权重),然后将窗口函数与权重序列进行卷积以产生一个决策函数。当决策函数超过某个阈值时,预测具有感兴趣特性的特征、区域将会出现。本文提出了一种近似最佳窗口和权重的通用方法。所需数据是一些链的序列及其特征的位置。该方法应用于膜蛋白的跨膜螺旋(TMH)。以细菌视紫红质和光合反应中心作为参考链,计算出最佳权重和窗口。然后在其他蛋白质上测试该预测器。在孔蛋白中未预测到跨膜螺旋,其跨膜片段是β折叠。这表明该预测器对螺旋片段具有特异性。对于非膜球状蛋白,预测到的片段很少。因此,该预测器正确地排除了它们的疏水螺旋。最后,用一些跨膜拓扑结构部分已知的膜蛋白测试该预测器。在它们的跨膜螺旋中,LCR无法分辨6%紧密间隔的螺旋。将预测的跨膜螺旋的最小允许长度设为17时,4%的已知跨膜螺旋被遗漏,6%的预测结果是错误的。对于最小长度为10,0.5%被遗漏,14%是错误的。端点误差的平均幅度约为四个残基。其他预测方法产生的错误更多。