Kandaswamy Krishna Kumar, Pugalenthi Ganesan, Möller Steffen, Hartmann Enno, Kalies Kai-Uwe, Suganthan P N, Martinetz Thomas
Institute for Neuro and Bioinformatics, University of Lübeck, Germany.
Protein Pept Lett. 2010 Dec;17(12):1473-9. doi: 10.2174/0929866511009011473.
Apoptosis is an essential process for controlling tissue homeostasis by regulating a physiological balance between cell proliferation and cell death. The subcellular locations of proteins performing the cell death are determined by mostly independent cellular mechanisms. The regular bioinformatics tools to predict the subcellular locations of such apoptotic proteins do often fail. This work proposes a model for the sorting of proteins that are involved in apoptosis, allowing us to both the prediction of their subcellular locations as well as the molecular properties that contributed to it. We report a novel hybrid Genetic Algorithm (GA)/Support Vector Machine (SVM) approach to predict apoptotic protein sequences using 119 sequence derived properties like frequency of amino acid groups, secondary structure, and physicochemical properties. GA is used for selecting a near-optimal subset of informative features that is most relevant for the classification. Jackknife cross-validation is applied to test the predictive capability of the proposed method on 317 apoptosis proteins. Our method achieved 85.80% accuracy using all 119 features and 89.91% accuracy for 25 features selected by GA. Our models were examined by a test dataset of 98 apoptosis proteins and obtained an overall accuracy of 90.34%. The results show that the proposed approach is promising; it is able to select small subsets of features and still improves the classification accuracy. Our model can contribute to the understanding of programmed cell death and drug discovery. The software and dataset are available at http://www.inb.uni-luebeck.de/tools-demos/apoptosis/GASVM.
细胞凋亡是通过调节细胞增殖和细胞死亡之间的生理平衡来控制组织稳态的重要过程。执行细胞死亡的蛋白质的亚细胞定位主要由独立的细胞机制决定。预测此类凋亡蛋白亚细胞定位的常规生物信息学工具常常失效。这项工作提出了一种参与细胞凋亡的蛋白质分选模型,使我们能够预测它们的亚细胞定位以及促成这种定位的分子特性。我们报告了一种新颖的混合遗传算法(GA)/支持向量机(SVM)方法,使用119种源自序列的特性(如氨基酸基团频率、二级结构和物理化学特性)来预测凋亡蛋白序列。GA用于选择与分类最相关的信息特征的近最优子集。采用留一法交叉验证来测试所提出方法对317种凋亡蛋白的预测能力。我们的方法使用所有119个特征时准确率达到85.80%,使用GA选择的25个特征时准确率达到89.91%。我们的模型通过98种凋亡蛋白的测试数据集进行检验,总体准确率为90.34%。结果表明所提出的方法很有前景;它能够选择小的特征子集,同时仍能提高分类准确率。我们的模型有助于理解程序性细胞死亡和药物发现。软件和数据集可在http://www.inb.uni-luebeck.de/tools-demos/apoptosis/GASVM获取。