Maurer-Stroh Sebastian, Eisenhaber Birgit, Eisenhaber Frank
Research Institute of Molecular Pathology, Dr. Bohr-Gasse 7, Vienna, A-1030, Austria.
J Mol Biol. 2002 Apr 5;317(4):541-57. doi: 10.1006/jmbi.2002.5426.
Myristoylation by the myristoyl-CoA:protein N-myristoyltransferase (NMT) is an important lipid anchor modification of eukaryotic and viral proteins. Automated prediction of N-terminal N-myristoylation from the substrate protein sequence alone is necessary for large-scale sequence annotation projects but it requires a low rate of false positive hits in addition to a sufficient sensitivity. Our previous analysis of substrate protein sequence variability, NMT sequences and 3D structures has revealed motif properties in addition to the known PROSITE motif that are utilized in a new predictor described here. The composite prediction function (with separate ad hoc parameterization (a) for queries from non-fungal eukaryotes and their viruses and (b) for sequences from fungal species) consists of terms evaluating amino acid type preferences at sequences positions close to the N terminus as well as terms penalizing deviations from the physical property pattern of amino acid side-chains encoded in multi-residue correlation within the motif sequence. The algorithm has been validated with a self-consistency and two jack-knife tests for the learning set as well as with kinetic data for model substrates. The sensitivity in recognizing documented NMT substrates is above 95 % for both taxon-specific versions. The corresponding rate of false positive prediction (for sequences with an N-terminal glycine residue) is close to 0.5 %; thus, the technique is applicable for large-scale automated sequence database annotation. The predictor is available as public WWW-server with the URL http://mendel.imp.univie.ac.at/myristate/. Additionally, we propose a version of the predictor that identifies a number of proteolytic protein processing sites at internal glycine residues and that evaluates possible N-terminal myristoylation of the protein fragments.A scan of public protein databases revealed new potential NMT targets for which the myristoyl modification may be of critical importance for biological function. Among others, the list includes kinases, phosphatases, proteasomal regulatory subunit 4, kinase interacting proteins KIP1/KIP2, protozoan flagellar proteins, homologues of mitochondrial translocase TOM40, of the neuronal calcium sensor NCS-1 and of the cytochrome c-type heme lyase CCHL. Analyses of complete eukaryote genomes indicate that about 0.5 % of all encoded proteins are apparent NMT substrates except for a higher fraction in Arabidopsis thaliana ( approximately 0.8 %).
肉豆蔻酰辅酶A:蛋白质N-肉豆蔻酰转移酶(NMT)介导的肉豆蔻酰化是真核生物和病毒蛋白一种重要的脂质锚定修饰。仅从底物蛋白序列自动预测N端N-肉豆蔻酰化对于大规模序列注释项目是必要的,但除了足够的灵敏度外,还需要低误报率。我们之前对底物蛋白序列变异性、NMT序列和三维结构的分析揭示了除已知的PROSITE基序外的基序特性,这些特性被用于此处描述的新预测器中。复合预测函数(针对来自非真菌真核生物及其病毒的查询以及来自真菌物种的序列分别进行特别参数化(a)和(b))包括评估靠近N端序列位置处氨基酸类型偏好的项,以及对基序序列中多残基相关性所编码的氨基酸侧链物理性质模式偏差进行惩罚的项。该算法已通过对学习集的自一致性和两次留一法测试以及对模型底物的动力学数据进行了验证。两种分类群特异性版本在识别已记录的NMT底物方面的灵敏度均高于95%。相应的误报预测率(对于具有N端甘氨酸残基的序列)接近0.5%;因此,该技术适用于大规模自动序列数据库注释。该预测器可作为公共万维网服务器使用,网址为http://mendel.imp.univie.ac.at/myristate/。此外,我们提出了一个版本的预测器,它可以识别内部甘氨酸残基处的一些蛋白水解加工位点,并评估蛋白片段可能的N端肉豆蔻酰化。对公共蛋白质数据库的扫描揭示了新的潜在NMT靶点,肉豆蔻酰修饰可能对其生物学功能至关重要。其中,该列表包括激酶、磷酸酶、蛋白酶体调节亚基4、激酶相互作用蛋白KIP1/KIP2、原生动物鞭毛蛋白、线粒体转位酶TOM40、神经元钙传感器NCS-1和细胞色素c型血红素裂解酶CCHL的同源物。对完整真核生物基因组的分析表明,除拟南芥中比例较高(约0.8%)外,所有编码蛋白中约0.5%是明显的NMT底物。