Rahman Julia, Mondal Md Nazrul Islam, Islam Md Khaled Ben, Hasan Md Al Mehedi
J Integr Bioinform. 2016 Dec 18;13(1):288. doi: 10.2390/biecoll-jib-2016-288.
For the importance of protein subcellular localization in different branches of life science and drug discovery, researchers have focused their attentions on protein subcellular localization prediction. Effective representation of features from protein sequences plays a most vital role in protein subcellular localization prediction specially in case of machine learning techniques. Single feature representation-like pseudo amino acid composition (PseAAC), physiochemical property models (PPM), and amino acid index distribution (AAID) contains insufficient information from protein sequences. To deal with such problems, we have proposed two feature fusion representations, AAIDPAAC and PPMPAAC, to work with Support Vector Machine classifiers, which fused PseAAC with PPM and AAID accordingly. We have evaluated the performance for both single and fused feature representation of a Gram-negative bacterial dataset. We have got at least 3% more actual accuracy by AAIDPAAC and 2% more locative accuracy by PPMPAAC than single feature representation.
鉴于蛋白质亚细胞定位在生命科学不同分支和药物发现中的重要性,研究人员已将注意力集中在蛋白质亚细胞定位预测上。在蛋白质亚细胞定位预测中,特别是在机器学习技术的情况下,从蛋白质序列中有效提取特征起着至关重要的作用。单一特征表示,如伪氨基酸组成(PseAAC)、物理化学性质模型(PPM)和氨基酸指数分布(AAID),包含的蛋白质序列信息不足。为了解决这些问题,我们提出了两种特征融合表示方法,即AAIDPAAC和PPMPAAC,并将其与支持向量机分类器配合使用,它们分别将PseAAC与PPM和AAID进行了融合。我们评估了革兰氏阴性细菌数据集的单一特征表示和融合特征表示的性能。与单一特征表示相比,AAIDPAAC的实际准确率至少提高了3%,PPMPAAC的定位准确率提高了2%。