Löwer Martin, Schneider Gisbert
Johann Wolfgang Goethe-University, Chair for Chem- and Bioinformatics, Frankfurt, Germany.
PLoS One. 2009 Jun 15;4(6):e5917. doi: 10.1371/journal.pone.0005917.
Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway ("effector proteins") have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown.
METHODOLOGY/PRINCIPAL FINDINGS: In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%).
CONCLUSIONS/SIGNIFICANCE: We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server (www.modlab.org).
感染动物和植物的致病细菌利用多种机制将毒力因子转运穿过其细胞膜,并将这些蛋白质导入受感染的宿主细胞。III型分泌系统就是这样一种机制。通过该途径转运的蛋白质(“效应蛋白”)必须与所有其他未从细菌细胞输出的蛋白质区分开来。尽管文献中提出了效应蛋白N末端的特殊靶向信号,但其确切特征仍然未知。
方法/主要发现:在本研究中,我们证明机器学习技术可以一致地识别和预测III型分泌系统效应器序列中编码的信号。从文献和序列数据库中收集已知的蛋白质效应器,并将其用作人工神经网络和支持向量机分类器的训练数据。常见的序列特征在效应器序列的前30个氨基酸中最为明显。分类准确率产生了交叉验证的马修斯相关系数为0.63,并允许在705个变形菌门基因组(12%的预测候选蛋白)、它们的染色体(11%)和质粒(13%)以及213个厚壁菌门基因组(7%)中对潜在的III型分泌系统效应器进行全基因组预测。
结论/意义:我们提出了一种信号预测方法,并对从918个已发表的细菌基因组中提取的潜在III型分泌系统效应器进行了全面调查。我们的研究表明,所分析的信号特征在广泛的物种中是常见的,并为鉴定作为未来治疗干预靶点的输出致病蛋白提供了坚实的基础。预测软件可从我们的网页服务器(www.modlab.org)公开获取。