Arif Muhammad, Hayat Maqsood, Jan Zahoor
Department of Computer Science, Abdul Wali Khan University Mardan, KP, Pakistan.
Department of Computer Science, Abdul Wali Khan University Mardan, KP, Pakistan.
J Theor Biol. 2018 Apr 7;442:11-21. doi: 10.1016/j.jtbi.2018.01.008. Epub 2018 Jan 11.
Membrane proteins execute significant roles in cellular processes of living organisms, ranging from cell signaling to cell adhesion. As a major part of a cell, the identification of membrane proteins and their functional types become a challenging job in the field of bioinformatics and proteomics from last few decades. Traditional experimental procedures are slightly applicable due to lack of recognized structures, enormous time and space. In this regard, the demand for fast, accurate and intelligent computational method is increased day by day. In this paper, a two-tier intelligent automated predictor has been developed called iMem-2LSAAC, which classifies protein sequence as membrane or non-membrane in first-tier (phase1) and in case of membrane the second-tier (phase2) identifies functional types of membrane protein. Quantitative attributes were extracted from protein sequences by applying three discrete features extraction schemes namely amino acid composition, pseudo amino acid composition and split amino acid composition (SAAC). Various learning algorithms were investigated by using jackknife test to select the best one for predictor. Experimental results exhibited that the highest predictive outcomes were yielded by SVM in conjunction with SAAC feature space on all examined datasets. The true classification rate of iMem-2LSAAC predictor is significantly higher than that of other state-of- the- art methods so far in the literature. Finally, it is expected that the proposed predictor will provide a solid framework for the development of pharmaceutical drug discovery and might be useful for researchers and academia.
膜蛋白在生物体的细胞过程中发挥着重要作用,从细胞信号传导到细胞黏附。作为细胞的主要组成部分,膜蛋白及其功能类型的识别在过去几十年里已成为生物信息学和蛋白质组学领域一项具有挑战性的工作。由于缺乏公认的结构、耗费大量时间和空间,传统的实验方法适用性有限。在这方面,对快速、准确且智能的计算方法的需求与日俱增。本文开发了一种名为iMem-2LSAAC的两层智能自动预测器,它在第一层(阶段1)将蛋白质序列分类为膜蛋白或非膜蛋白,若为膜蛋白,则在第二层(阶段2)识别膜蛋白的功能类型。通过应用三种离散特征提取方案,即氨基酸组成、伪氨基酸组成和分割氨基酸组成(SAAC),从蛋白质序列中提取定量属性。使用留一法检验研究了各种学习算法,以选择最适合预测器的算法。实验结果表明,在所有检测数据集上,支持向量机(SVM)结合SAAC特征空间产生了最高的预测结果。到目前为止,iMem-2LSAAC预测器的真分类率显著高于文献中其他现有方法。最后,预计所提出的预测器将为药物研发提供一个坚实的框架,可能对研究人员和学术界有用。