Department of Computer Science and Engineering, Information Engineering College, Shanghai Maritime University, 1550 Haigang Ave, Shanghai 201306, PR China.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-11-S1-S47.
Type III secretion system (T3SS) is a specialized protein delivery system in gram-negative bacteria that injects proteins (called effectors) directly into the eukaryotic host cytosol and facilitates bacterial infection. For many plant and animal pathogens, T3SS is indispensable for disease development. Recently, T3SS has also been found in rhizobia and plays a crucial role in the nodulation process. Although a great deal of efforts have been done to understand type III secretion, the precise mechanism underlying the secretion and translocation process has not been fully understood. In particular, defined secretion and translocation signals enabling the secretion have not been identified from the type III secreted effectors (T3SEs), which makes the identification of these important virulence factors notoriously challenging. The availability of a large number of sequenced genomes for plant and animal-associated bacteria demands the development of efficient and effective prediction methods for the identification of T3SEs using bioinformatics approaches.
We have developed a machine learning method based on the N-terminal amino acid sequences to predict novel type III effectors in the plant pathogen Pseudomonas syringae and the microsymbiont rhizobia. The extracted features used in the learning model (or classifier) include amino acid composition, secondary structure and solvent accessibility information. The method achieved a precision of over 90% on P. syringae in a cross validation study. In combination with a promoter screen for the type III specific promoters, this classifier trained on the P. syringae data was applied to predict novel T3SEs from the genomic sequences of four rhizobial strains. This application resulted in 57 candidate type III secreted proteins, 17 of which are confirmed effectors.
Our experimental results demonstrate that the machine learning method based on N-terminal amino acid sequences combined with a promoter screen could prove to be a very effective computational approach for predicting novel type III effectors in gram-negative bacteria. Our method and data are available to the public upon request.
III 型分泌系统(T3SS)是革兰氏阴性菌中一种专门的蛋白质输送系统,它将蛋白质(称为效应物)直接注入真核宿主细胞质中,并促进细菌感染。对于许多植物和动物病原体,T3SS 是疾病发展所必需的。最近,T3SS 也在根瘤菌中被发现,在结瘤过程中起着至关重要的作用。尽管人们已经做了大量的努力来了解 III 型分泌系统,但分泌和易位过程的确切机制尚未完全了解。特别是,尚未从 III 型分泌效应物(T3SE)中确定定义明确的分泌和易位信号,这使得这些重要毒力因子的鉴定极具挑战性。大量已测序的植物和动物相关细菌基因组的出现,要求开发有效的生物信息学方法来预测 T3SE。
我们开发了一种基于 N 端氨基酸序列的机器学习方法,用于预测植物病原体丁香假单胞菌和共生菌根瘤菌中的新型 III 型效应物。学习模型(或分类器)中提取的特征包括氨基酸组成、二级结构和溶剂可及性信息。在交叉验证研究中,该方法在 P. syringae 中的精度超过 90%。结合 III 型特异性启动子的筛选,将在 P. syringae 数据上训练的分类器应用于预测四个根瘤菌菌株基因组序列中的新型 T3SE。该应用导致 57 个候选 III 型分泌蛋白,其中 17 个被证实是效应物。
我们的实验结果表明,基于 N 端氨基酸序列的机器学习方法与启动子筛选相结合,可以成为预测革兰氏阴性菌新型 III 型效应物的一种非常有效的计算方法。我们的方法和数据可应要求提供给公众。