School of Science, Xi'an Polytechnic University, Xi'an 710048, PR China.
School of Mathematics and Statistics, Xidian University, Xi'an 710071, PR China.
J Theor Biol. 2018 Oct 7;454:22-29. doi: 10.1016/j.jtbi.2018.05.035. Epub 2018 May 29.
Gram-negative bacterial secreted proteins are crucial for bacterial pathogenesis by making bacteria interact with their environments. Therefore, identification of bacterial secreted proteins becomes a significant process for the research of various diseases and the corresponding drugs. In this paper, we develop a feature design model named ACCP-KL-NMF by fusing PSSM-based auto-cross correlation analysis for features extraction and nonnegative matrix factorization algorithm based on Kullback-Leibler divergence for dimensionality reduction. Hence, a 150-dimensional feature vector is constructed on the training set. Then support vector machine is adopted as the classifier, and the most objective jackknife test is chosen for evaluating the accuracy. The ACCP-KL-NMF model yields the approving performance of the overall accuracy on the test set, and also outperforms the other three existing models. The numerical experimental results show that our model is effective and reliable for identification of Gram-negative bacterial secreted protein types. Moreover, it is anticipated that the proposed model could be beneficial for other biology sequence in future research.
革兰氏阴性菌分泌蛋白通过使细菌与环境相互作用,对细菌的发病机制起着至关重要的作用。因此,鉴定细菌分泌蛋白成为研究各种疾病和相应药物的重要过程。在本文中,我们开发了一种特征设计模型,名为 ACCP-KL-NMF,它融合了基于 PSSM 的自相关分析用于特征提取和基于 Kullback-Leibler 散度的非负矩阵分解算法用于降维。因此,在训练集上构建了一个 150 维的特征向量。然后采用支持向量机作为分类器,并选择最客观的刀切测试来评估准确性。ACCP-KL-NMF 模型在测试集上的整体准确性表现令人满意,并且优于其他三个现有模型。数值实验结果表明,我们的模型对于革兰氏阴性菌分泌蛋白类型的识别是有效和可靠的。此外,预计该模型在未来的研究中也将有助于其他生物学序列。