Bender Andreas, van Dooren Giel G, Ralph Stuart A, McFadden Geoffrey I, Schneider Gisbert
Johann Wolfgang Goethe-Universität Frankfurt, Institut für Organische Chemie und Chemische Biologie, Marie-Curie-Strasse 11, D-60439, Frankfurt, Germany.
Mol Biochem Parasitol. 2003 Dec;132(2):59-66. doi: 10.1016/j.molbiopara.2003.07.001.
A neural network approach for the prediction of mitochondrial transit peptides (mTPs) from the malaria-causing parasite Plasmodium falciparum is presented. Nuclear-encoded mitochondrial protein precursors of P. falciparum were analyzed by statistical methods, principal component analysis and supervised neural networks, and were compared to those of other eukaryotes. A distinct amino acid usage pattern has been found in protein encoding regions of P. falciparum: glycine, alanine, tryptophan and arginine are under-represented, whereas isoleucine, tyrosine, asparagine and lysine are over-represented compared to the SwissProt average. Similar patterns were observed in mTPs of P. falciparum. Using principal component analysis (PCA), mTPs from P. falciparum were shown to differ considerably from those of other organisms. A neural network system (PlasMit) for prediction of mTPs in P. falciparum sequences was developed, based on the relative amino acid frequency in the first 24 N-terminal amino acids, yielding a Matthews correlation coefficient of 0.74 (90% correct prediction) in a 20-fold cross-validation study. This system predicted 1177 (22%) mitochondrial genes, based on 5334 annotated genes in the P. falciparum genome. A second network with the same topology was trained to give more conservative estimate. This more stringent network yielded a Matthews correlation coefficient of 0.51 (84% correct prediction) in a 10-fold cross-validation study. It predicted 381 (7.1%) mitochondrial genes, based on 5334 annotated genes in the P. falciparum genome.
本文提出了一种用于预测疟原虫(恶性疟原虫)线粒体转运肽(mTPs)的神经网络方法。通过统计方法、主成分分析和监督神经网络对恶性疟原虫的核编码线粒体蛋白前体进行了分析,并与其他真核生物的进行了比较。在恶性疟原虫的蛋白质编码区域发现了一种独特的氨基酸使用模式:与SwissProt平均值相比,甘氨酸、丙氨酸、色氨酸和精氨酸的含量较低,而异亮氨酸、酪氨酸、天冬酰胺和赖氨酸的含量较高。在恶性疟原虫的mTPs中也观察到了类似的模式。使用主成分分析(PCA)表明,恶性疟原虫的mTPs与其他生物的mTPs有很大差异。基于前24个N端氨基酸的相对氨基酸频率,开发了一种用于预测恶性疟原虫序列中mTPs的神经网络系统(PlasMit),在20倍交叉验证研究中,马修斯相关系数为0.74(正确预测率为90%)。基于恶性疟原虫基因组中的5334个注释基因,该系统预测了1177个(22%)线粒体基因。训练了一个具有相同拓扑结构的第二个网络,以给出更保守的估计。在10倍交叉验证研究中,这个更严格的网络马修斯相关系数为0.51(正确预测率为84%)。基于恶性疟原虫基因组中的5334个注释基因,它预测了381个(7.1%)线粒体基因。