a School of Mathematics and Information Science & Technology , Hebei Normal University of Science & Technology , Qinhuangdao , PR China.
b School of Mathematics and Statistics , Northeastern University at Qinhuangdao , Qinhuangdao , PR China.
SAR QSAR Environ Res. 2019 Mar;30(3):181-194. doi: 10.1080/1062936X.2019.1573438. Epub 2019 Feb 11.
In Gram-negative bacteria, a wide range of proteins are secreted by highly specialized secretion systems. These secreted proteins play essential roles in the response of bacteria to their environment and also in several physiological processes such as adhesion, pathogenicity, adaptation and survival. Therefore, identifying secreted proteins in Gram-negative bacteria may assist in understanding the secretion mechanism and development of new antimicrobial strategies. Considering that a single-feature model is less likely to comprehensively cover this information, three kinds of feature models were used in this paper to represent protein samples by composition analysis, correlation analysis and smoothing encoding method on position-specific scoring matrix profiles. A support vector machine-based ensemble method with these hybrid features was developed to predict multi-type Gram-negative bacterial secreted proteins. Finally, our method achieves overall accuracies of 97.09% and 96.51% using an independent dataset test and jackknife test on a public test dataset, which are 3.49% and 2.32% higher, respectively, than results obtained by other methods. These results show the effectiveness and stability of the proposed ensemble method. It is anticipated that our method will provide useful information for further research on bacterial secreted proteins and secreted systems.
在革兰氏阴性菌中,多种蛋白质通过高度专业化的分泌系统进行分泌。这些分泌蛋白在细菌对环境的反应以及在黏附、致病性、适应性和生存等多种生理过程中发挥着重要作用。因此,鉴定革兰氏阴性菌中的分泌蛋白有助于了解分泌机制和开发新的抗菌策略。考虑到单一特征模型不太可能全面涵盖这些信息,本文使用了三种特征模型,通过组成分析、相关分析和位置特异性评分矩阵图谱的平滑编码方法来表示蛋白质样本。基于支持向量机的集成方法利用这些混合特征来预测多种类型的革兰氏阴性细菌分泌蛋白。最后,我们的方法在独立数据集测试和公共测试数据集的折刀测试中分别达到了 97.09%和 96.51%的整体准确率,分别比其他方法提高了 3.49%和 2.32%。这些结果表明了所提出的集成方法的有效性和稳定性。预计我们的方法将为进一步研究细菌分泌蛋白和分泌系统提供有用的信息。