Department of Microbiology, College of Basic Medical Sciences, Third Military Medical University (TMMU), Chongqing 40038, China and Department of Tuberculosis, Institute of Infectious TB Prevention, Third Hospital of PLA, Baoji, Shanxi 721006, China.
Bioinformatics. 2013 Dec 15;29(24):3135-42. doi: 10.1093/bioinformatics/btt554. Epub 2013 Sep 23.
Various human pathogens secret effector proteins into hosts cells via the type IV secretion system (T4SS). These proteins play important roles in the interaction between bacteria and hosts. Computational methods for T4SS effector prediction have been developed for screening experimental targets in several isolated bacterial species; however, widely applicable prediction approaches are still unavailable
In this work, four types of distinctive features, namely, amino acid composition, dipeptide composition, .position-specific scoring matrix composition and auto covariance transformation of position-specific scoring matrix, were calculated from primary sequences. A classifier, T4EffPred, was developed using the support vector machine with these features and their different combinations for effector prediction. Various theoretical tests were performed in a newly established dataset, and the results were measured with four indexes. We demonstrated that T4EffPred can discriminate IVA and IVB effectors in benchmark datasets with positive rates of 76.7% and 89.7%, respectively. The overall accuracy of 95.9% shows that the present method is accurate for distinguishing the T4SS effector in unidentified sequences. A classifier ensemble was designed to synthesize all single classifiers. Notable performance improvement was observed using this ensemble system in benchmark tests. To demonstrate the model's application, a genome-scale prediction of effectors was performed in Bartonella henselae, an important zoonotic pathogen. A number of putative candidates were distinguished.
A web server implementing the prediction method and the source code are both available at http://bioinfo.tmmu.edu.cn/T4EffPred.
各种人类病原体通过 IV 型分泌系统(T4SS)将效应蛋白分泌到宿主细胞中。这些蛋白质在细菌与宿主之间的相互作用中发挥着重要作用。已经开发了用于筛选几种分离细菌物种中实验靶标的 T4SS 效应物预测计算方法;然而,仍然缺乏广泛适用的预测方法。
在这项工作中,从原始序列中计算了四种独特的特征,即氨基酸组成、二肽组成、位置特异性评分矩阵组成和位置特异性评分矩阵的自协方差变换。使用支持向量机和这些特征及其不同组合,开发了一个名为 T4EffPred 的分类器,用于效应物预测。在新建立的数据集上进行了各种理论测试,并使用四个指标进行了测量。我们证明,T4EffPred 可以区分基准数据集的 IVA 和 IVB 效应物,阳性率分别为 76.7%和 89.7%。总体准确率为 95.9%,表明该方法能够准确区分未识别序列中的 T4SS 效应物。设计了一个分类器集成来综合所有的单分类器。在基准测试中,使用该集成系统观察到了显著的性能改进。为了展示模型的应用,在伯氏疏螺旋体(一种重要的人畜共患病病原体)中进行了全基因组效应物预测。区分出了一些候选者。
实现预测方法的网络服务器和源代码均可在 http://bioinfo.tmmu.edu.cn/T4EffPred 上获得。