Persson B, Argos P
European Molecular Biology Laboratory, Heidelberg, Germany.
J Mol Biol. 1994 Mar 25;237(2):182-92. doi: 10.1006/jmbi.1994.1220.
A method for prediction of transmembrane segments from multiply aligned amino acid sequences is presented. For the calculations, two sets of propensity values were used: one for the middle, hydrophobic portion and one for the terminal regions of the transmembrane sequence spans. Average propensity values were calculated for each position along the alignment, with the contribution from each sequence weighted according to its dissimilarity relative to the other aligned sequences. Eight-residue segments were considered as potential cores of transmembrane segments and elongated if their middle propensity values were above a given threshold. End propensity values were also considered as stop signals. Only helices with length of 15 to 29 residues were allowed and corrections for strictly conserved charged residues were also made. The method is shown to be more successful than predictions based upon single sequences alone. In the test set of 28 families with 126 transmembrane segments, only five spans were not predicted or constituted false positives. The method is applied to sequence families for which data on transmembrane segments do not exist or are sparse or contradictory included voltage-gated potassium-channels, cytochrome c oxidases, NADH-ubiquinone oxidoreductase, beta-glucosides-specific phosphotransferase enzyme and major surface antigen of hepatitis B virus.
本文提出了一种从多重比对的氨基酸序列预测跨膜片段的方法。在计算过程中,使用了两组倾向值:一组用于跨膜序列跨度的中间疏水部分,另一组用于末端区域。沿着比对序列计算每个位置的平均倾向值,每个序列的贡献根据其与其他比对序列的差异程度进行加权。八残基片段被视为跨膜片段的潜在核心,如果其中间倾向值高于给定阈值,则进行延伸。末端倾向值也被视为终止信号。只允许长度为15至29个残基的螺旋,并对严格保守的带电残基进行校正。结果表明,该方法比仅基于单序列的预测更为成功。在包含126个跨膜片段的28个家族的测试集中,只有5个片段未被预测到或被误判为阳性。该方法被应用于那些不存在跨膜片段数据、数据稀少或相互矛盾的序列家族,包括电压门控钾通道、细胞色素c氧化酶、NADH-泛醌氧化还原酶、β-葡萄糖苷特异性磷酸转移酶和乙肝病毒主要表面抗原。