Wagner Naama, Alburquerque Michael, Ecker Noa, Dotan Edo, Zerah Ben, Pena Michelle Mendonca, Potnis Neha, Pupko Tal
The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States.
Front Plant Sci. 2022 Oct 31;13:1024405. doi: 10.3389/fpls.2022.1024405. eCollection 2022.
Type III effectors are proteins injected by Gram-negative bacteria into eukaryotic hosts. In many plant and animal pathogens, these effectors manipulate host cellular processes to the benefit of the bacteria. Type III effectors are secreted by a type III secretion system that must "classify" each bacterial protein into one of two categories, either the protein should be translocated or not. It was previously shown that type III effectors have a secretion signal within their N-terminus, however, despite numerous efforts, the exact biochemical identity of this secretion signal is generally unknown. Computational characterization of the secretion signal is important for the identification of novel effectors and for better understanding the molecular translocation mechanism. In this work we developed novel machine-learning algorithms for characterizing the secretion signal in both plant and animal pathogens. Specifically, we represented each protein as a vector in high-dimensional space using Facebook's protein language model. Classification algorithms were next used to separate effectors from non-effector proteins. We subsequently curated a benchmark dataset of hundreds of effectors and thousands of non-effector proteins. We showed that on this curated dataset, our novel approach yielded substantially better classification accuracy compared to previously developed methodologies. We have also tested the hypothesis that plant and animal pathogen effectors are characterized by different secretion signals. Finally, we integrated the novel approach in Effectidor, a web-server for predicting type III effector proteins, leading to a more accurate classification of effectors from non-effectors.
III型效应蛋白是革兰氏阴性细菌注入真核宿主细胞的蛋白质。在许多动植物病原体中,这些效应蛋白会操纵宿主细胞过程,从而有利于细菌。III型效应蛋白由III型分泌系统分泌,该系统必须将每种细菌蛋白“分类”为两类中的一类,即该蛋白是否应该被转运。先前的研究表明,III型效应蛋白在其N端具有分泌信号,然而,尽管进行了大量研究,但这种分泌信号的确切生化特性通常仍不清楚。对分泌信号进行计算表征对于识别新型效应蛋白以及更好地理解分子转运机制非常重要。在这项工作中,我们开发了新颖的机器学习算法来表征动植物病原体中的分泌信号。具体来说,我们使用Facebook的蛋白质语言模型将每种蛋白质表示为高维空间中的向量。接下来使用分类算法将效应蛋白与非效应蛋白区分开来。随后,我们精心策划了一个包含数百种效应蛋白和数千种非效应蛋白的基准数据集。我们表明,在这个精心策划的数据集上,与先前开发的方法相比,我们的新方法产生了更高的分类准确率。我们还测试了动植物病原体效应蛋白具有不同分泌信号这一假设。最后,我们将这种新方法集成到Effectidor中,这是一个用于预测III型效应蛋白的网络服务器,从而实现了效应蛋白与非效应蛋白之间更准确的分类。