Yuan Zheng, Davis Melissa J, Zhang Fasheng, Teasdale Rohan D
ARC Centre in Bioinformatics, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4066, Australia.
Biochem Biophys Res Commun. 2003 Dec 26;312(4):1278-83. doi: 10.1016/j.bbrc.2003.11.069.
Signal peptides and transmembrane helices both contain a stretch of hydrophobic amino acids. This common feature makes it difficult for signal peptide and transmembrane helix predictors to correctly assign identity to stretches of hydrophobic residues near the N-terminal methionine of a protein sequence. The inability to reliably distinguish between N-terminal transmembrane helix and signal peptide is an error with serious consequences for the prediction of protein secretory status or transmembrane topology. In this study, we report a new method for differentiating protein N-terminal signal peptides and transmembrane helices. Based on the sequence features extracted from hydrophobic regions (amino acid frequency, hydrophobicity, and the start position), we set up discriminant functions and examined them on non-redundant datasets with jackknife tests. This method can incorporate other signal peptide prediction methods and achieve higher prediction accuracy. For Gram-negative bacterial proteins, 95.7% of N-terminal signal peptides and transmembrane helices can be correctly predicted (coefficient 0.90). Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 99% (coefficient 0.92). For eukaryotic proteins, 94.2% of N-terminal signal peptides and transmembrane helices can be correctly predicted with coefficient 0.83. Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 87% (coefficient 0.85). The method can be used to complement current transmembrane protein prediction and signal peptide prediction methods to improve their prediction accuracies.
信号肽和跨膜螺旋都包含一段疏水性氨基酸。这一共同特征使得信号肽和跨膜螺旋预测器难以正确识别蛋白质序列中靠近N端甲硫氨酸的疏水性残基片段的性质。无法可靠地区分N端跨膜螺旋和信号肽是一个错误,会对蛋白质分泌状态或跨膜拓扑结构的预测产生严重后果。在本研究中,我们报告了一种区分蛋白质N端信号肽和跨膜螺旋的新方法。基于从疏水区提取的序列特征(氨基酸频率、疏水性和起始位置),我们建立了判别函数,并通过留一法测试在非冗余数据集上对其进行检验。该方法可以整合其他信号肽预测方法,实现更高的预测准确率。对于革兰氏阴性细菌蛋白,95.7%的N端信号肽和跨膜螺旋能够被正确预测(系数为0.90)。在灵敏度为90%的情况下,跨膜螺旋能够从信号肽中以99%的精度被识别出来(系数为0.92)。对于真核生物蛋白,94.2%的N端信号肽和跨膜螺旋能够以系数0.83被正确预测。在灵敏度为90%的情况下,跨膜螺旋能够从信号肽中以87%的精度被识别出来(系数为0.85)。该方法可用于补充当前的跨膜蛋白预测和信号肽预测方法,以提高它们的预测准确率。