Lo Allan, Chiu Hua-Sheng, Sung Ting-Yi, Hsu Wen-Lian
Bioinformatics Lab., Institute of Information Science, Academia Sinica, Taipei, Taiwan.
Comput Syst Bioinformatics Conf. 2006:31-42.
A key class of membrane proteins contains one or more transmembrane (TM) helices, traversing the membrane lipid bilayer. Various properties such as the length, arrangement and topology or orientation of TM helices, are closely related to a protein's functions. Although a range of methods have been developed to predict TM helices and their topologies, no single method consistently outperforms the others. In addition, topology prediction has much lower accuracy than helix prediction, and thus requires continuous improvements.
We develop a method based on support vector machines (SVM) in a hierarchical framework to predict TM helices first, followed by their topology. By partitioning the prediction problem into two steps, specific input features can be selected and integrated in each step. We also propose a novel scoring function for topology models based on membrane protein folding process. When benchmarked against other methods in terms of performance, our approach achieves the highest scores at 86% in helix prediction (Q(2)) and 91% in topology prediction (TOPO) for the high-resolution data set, resulting in an improvement of 6% and 14% in their respective categories over the second best method. Furthermore, we demonstrate the ability of our method to discriminate between membrane and non-membrane proteins, with higher than 99% in accuracy. When tested on a small set of newly solved structures of membrane proteins, our method overcomes some of the difficulties in predicting TM helices by incorporating multiple biological input features.
一类关键的膜蛋白包含一个或多个跨膜(TM)螺旋,横穿膜脂双层。TM螺旋的各种特性,如长度、排列、拓扑结构或方向,都与蛋白质的功能密切相关。尽管已经开发了一系列方法来预测TM螺旋及其拓扑结构,但没有一种方法始终优于其他方法。此外,拓扑结构预测的准确性远低于螺旋预测,因此需要不断改进。
我们开发了一种基于支持向量机(SVM)的分层框架方法,首先预测TM螺旋,然后预测其拓扑结构。通过将预测问题分为两个步骤,可以在每个步骤中选择和整合特定的输入特征。我们还基于膜蛋白折叠过程为拓扑模型提出了一种新颖的评分函数。在性能方面与其他方法进行基准测试时,我们的方法在高分辨率数据集的螺旋预测(Q(2))中达到了86%的最高分,在拓扑结构预测(TOPO)中达到了91%,在各自类别中比第二好的方法分别提高了6%和14%。此外,我们证明了我们的方法能够区分膜蛋白和非膜蛋白,准确率高于99%。在一小组新解析的膜蛋白结构上进行测试时,我们的方法通过整合多种生物学输入特征克服了预测TM螺旋中的一些困难。