Sonnhammer E L, von Heijne G, Krogh A
National Center for Biotechnology Information, NLM/NIH, Bethesda, Maryland 20894, USA.
Proc Int Conf Intell Syst Mol Biol. 1998;6:175-82.
A novel method to model and predict the location and orientation of alpha helices in membrane-spanning proteins is presented. It is based on a hidden Markov model (HMM) with an architecture that corresponds closely to the biological system. The model is cyclic with 7 types of states for helix core, helix caps on either side, loop on the cytoplasmic side, two loops for the non-cytoplasmic side, and a globular domain state in the middle of each loop. The two loop paths on the non-cytoplasmic side are used to model short and long loops separately, which corresponds biologically to the two known different membrane insertions mechanisms. The close mapping between the biological and computational states allows us to infer which parts of the model architecture are important to capture the information that encodes the membrane topology, and to gain a better understanding of the mechanisms and constraints involved. Models were estimated both by maximum likelihood and a discriminative method, and a method for reassignment of the membrane helix boundaries were developed. In a cross validated test on single sequences, our transmembrane HMM, TMHMM, correctly predicts the entire topology for 77% of the sequences in a standard dataset of 83 proteins with known topology. The same accuracy was achieved on a larger dataset of 160 proteins. These results compare favourably with existing methods.
本文提出了一种用于建模和预测跨膜蛋白中α螺旋位置和方向的新方法。该方法基于一种隐马尔可夫模型(HMM),其结构与生物系统紧密对应。该模型是循环的,具有7种状态,分别为螺旋核心、两侧的螺旋帽、细胞质侧的环、非细胞质侧的两个环以及每个环中间的球状结构域状态。非细胞质侧的两条环路径分别用于对短环和长环进行建模,这在生物学上对应于两种已知的不同膜插入机制。生物状态与计算状态之间的紧密映射使我们能够推断模型结构的哪些部分对于捕获编码膜拓扑结构的信息很重要,并更好地理解其中涉及的机制和限制。通过最大似然法和判别法对模型进行了估计,并开发了一种重新确定膜螺旋边界的方法。在对单序列的交叉验证测试中,我们的跨膜HMM(TMHMM)在一个包含83个具有已知拓扑结构蛋白质的标准数据集中,正确预测了77%序列的完整拓扑结构。在一个包含160个蛋白质的更大数据集中也达到了相同的准确率。这些结果优于现有方法。