College of Information, Shanghai Ocean University, Shanghai 201306, China.
Comput Math Methods Med. 2021 Jan 6;2021:6690299. doi: 10.1155/2021/6690299. eCollection 2021.
Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.
鉴定细菌 III 型分泌效应物(T3SE)已成为生物信息学领域的热门研究课题,因为它对于理解宿主-病原体相互作用和开发针对病原体的更好治疗靶点至关重要。然而,使用传统的实验方法识别所有效应蛋白往往既费时又费力。因此,开发能够准确预测潜在新效应物的计算方法对于减少用于验证的生物学实验数量非常重要。在这项研究中,我们提出了一种称为 iT3SE-PX 的方法,仅基于蛋白质序列来鉴定 T3SE。首先,从位置特异性评分矩阵(PSSM)谱中提取三种特征来帮助训练机器学习(ML)模型。然后,使用极端梯度提升(XGBoost)算法根据分类能力对这些特征进行排序。最后,选择最佳特征作为支持向量机(SVM)分类器的输入,以预测 T3SE。基于两个基准数据集,我们分别进行了 100 次随机 5 折交叉验证(CV)和独立测试。实验结果表明,与大多数现有方法相比,所提出的方法表现出色,仅基于序列信息即可作为识别潜在 T3SE 的有用工具。