School of Electrical Engineering and Computer Science, Washington State University, Pullman, Washington, United States of America.
PLoS One. 2012;7(5):e36598. doi: 10.1371/journal.pone.0036598. Epub 2012 May 4.
Most of the parasites of the phylum Apicomplexa contain a relict prokaryotic-derived plastid called the apicoplast. This organelle is important not only for the survival of the parasite, but its unique properties make it an ideal drug target. The majority of apicoplast-associated proteins are nuclear encoded and targeted post-translationally to the organellar lumen via a bipartite signaling mechanism that requires an N-terminal signal and transit peptide (TP). Attempts to define a consensus motif that universally identifies apicoplast TPs have failed.
METHODOLOGY/PRINCIPAL FINDINGS: In this study, we propose a generalized rule-based classification model to identify apicoplast-targeted proteins (ApicoTPs) that use a bipartite signaling mechanism. Given a training set specific to an organism, this model, called ApicoAP, incorporates a procedure based on a genetic algorithm to tailor a discriminating rule that exploits the known characteristics of ApicoTPs. Performance of ApicoAP is evaluated for four labeled datasets of Plasmodium falciparum, Plasmodium yoelii, Babesia bovis, and Toxoplasma gondii proteins. ApicoAP improves the classification accuracy of the published dataset for P. falciparum to 94%, originally 90% using PlasmoAP.
CONCLUSIONS/SIGNIFICANCE: We present a parametric model for ApicoTPs and a procedure to optimize the model parameters for a given training set. A major asset of this model is that it is customizable to different parasite genomes. The ApicoAP prediction software is available at http://code.google.com/p/apicoap/ and http://bcb.eecs.wsu.edu.
大多数质体锥虫门的寄生虫含有一个称为质体的遗留原核衍生质体。这个细胞器不仅对寄生虫的生存很重要,而且其独特的性质使其成为理想的药物靶标。大多数质体相关蛋白是核编码的,并通过二分信号机制靶向细胞器腔,该机制需要一个 N 端信号和转运肽 (TP)。试图定义一个普遍识别质体 TP 的共识基序的尝试失败了。
方法/主要发现:在这项研究中,我们提出了一种基于规则的分类模型,用于识别使用二分信号机制的质体靶向蛋白 (ApicoTPs)。给定特定于生物体的训练集,这个名为 ApicoAP 的模型结合了基于遗传算法的过程,以定制一个区分规则,利用 ApicoTPs 的已知特征。ApicoAP 的性能针对四种标记的疟原虫、约氏疟原虫、牛巴贝斯虫和刚地弓形虫蛋白数据集进行了评估。ApicoAP 将疟原虫的已发表数据集的分类准确性从原来的 90%提高到 94%,使用 PlasmoAP 提高到 94%。
结论/意义:我们提出了一种 ApicoTPs 的参数模型和一种用于为给定训练集优化模型参数的过程。该模型的一个主要优点是它可以针对不同的寄生虫基因组进行定制。ApicoAP 预测软件可在 http://code.google.com/p/apicoap/ 和 http://bcb.eecs.wsu.edu 获得。