School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USA.
J Microbiol Methods. 2013 Dec;95(3):313-9. doi: 10.1016/j.mimet.2013.09.017. Epub 2013 Oct 3.
Computational identification of apicoplast-targeted proteins is important in drug target determination for diseases such as malaria. While there are established methods for identifying proteins with a bipartite signal in multiple species of Apicomplexa, not all apicoplast-targeted proteins possess this bipartite signature. The publication of recent experimental findings of apicoplast membrane proteins, called transmembrane proteins, that do not possess a bipartite signal has made it feasible to devise a machine learning approach for identifying this new class of apicoplast-targeted proteins computationally.
METHODOLOGY/PRINCIPAL FINDINGS: In this work, we develop a method for predicting apicoplast-targeted transmembrane proteins for multiple species of Apicomplexa, whereby several classifiers trained on different feature sets and based on different algorithms are evaluated and combined in an ensemble classification model to obtain the best expected performance. The feature sets considered are the hydrophobicity and composition characteristics of amino acids over transmembrane domains, the existence of short sequence motifs over cytosolically disposed regions, and Gene Ontology (GO) terms associated with given proteins. Our model, ApicoAMP, is an ensemble classification model that combines decisions of classifiers following the majority vote principle. ApicoAMP is trained on a set of proteins from 11 apicomplexan species and achieves 91% overall expected accuracy.
CONCLUSIONS/SIGNIFICANCE: ApicoAMP is the first computational model capable of identifying apicoplast-targeted transmembrane proteins in Apicomplexa. The ApicoAMP prediction software is available at http://code.google.com/p/apicoamp/ and http://bcb.eecs.wsu.edu.
在疟疾等疾病的药物靶点确定中,对质体靶向蛋白进行计算识别非常重要。虽然已经有确定多物种顶复门生物中具有双信号的蛋白的既定方法,但并非所有质体靶向蛋白都具有这种双信号特征。最近有关质体膜蛋白(称为跨膜蛋白)的实验发现的发表,这些蛋白不具有双信号特征,这使得可以设计一种机器学习方法来计算识别这种新的质体靶向蛋白类。
方法/主要发现:在这项工作中,我们开发了一种用于预测多种顶复门生物的质体靶向跨膜蛋白的方法,其中评估和组合了基于不同算法的不同特征集训练的多个分类器,以获得最佳预期性能的集成分类模型。所考虑的特征集是跨膜结构域上的氨基酸疏水性和组成特征、细胞质区域上存在短序列基序,以及与给定蛋白相关的基因本体 (GO) 术语。我们的模型 ApicoAMP 是一种集成分类模型,它遵循多数票原则对分类器的决策进行组合。ApicoAMP 是在来自 11 种顶复门生物的蛋白集上进行训练的,整体预期准确性达到 91%。
结论/意义:ApicoAMP 是第一个能够识别顶复门生物中质体靶向跨膜蛋白的计算模型。ApicoAMP 预测软件可在 http://code.google.com/p/apicoamp/ 和 http://bcb.eecs.wsu.edu 获得。