Calia Giulia, Cestaro Alessandro, Schuler Hannes, Janik Katrin, Donati Claudio, Moser Mirko, Bottini Silvia
Faculty of Agricultural, Environmental and Food Sciences, Free University of Bolzano, 39100 Bolzano, Italy.
Research and Innovation Centre, Fondazione Edmund Mach, 38010 San Michele all'Adige, Italy.
NAR Genom Bioinform. 2024 Jul 30;6(3):lqae087. doi: 10.1093/nargab/lqae087. eCollection 2024 Sep.
Phytoplasma' genus, a group of fastidious phloem-restricted bacteria, can infect a wide variety of both ornamental and agro-economically important plants. Phytoplasmas secrete effector proteins responsible for the symptoms associated with the disease. Identifying and characterizing these proteins is of prime importance for expanding our knowledge of the molecular bases of the disease. We faced the challenge of identifying phytoplasma's effectors by developing LEAPH, a machine learning ensemble predictor composed of four models. LEAPH was trained on 479 proteins from 53 phytoplasma species, described by 30 features. LEAPH achieved 97.49% accuracy, 95.26% precision and 98.37% recall, ensuring a low false-positive rate and outperforming available state-of-the-art methods. The application of LEAPH to 13 phytoplasma proteomes yields a comprehensive landscape of 2089 putative pathogenicity proteins. We identified three classes according to different secretion models: 'classical', 'classical-like' and 'non-classical'. Importantly, LEAPH identified 15 out of 17 known experimentally validated effectors belonging to the three classes. Furthermore, to help the selection of novel candidates for biological validation, we applied the Self-Organizing Maps algorithm and developed a Shiny app called EffectorComb. LEAPH and the EffectorComb app can be used to boost the characterization of putative effectors at both computational and experimental levels, and can be employed in other phytopathological models.
植原体属是一类寄生于韧皮部的难以培养的细菌,可感染多种观赏植物和具有重要农业经济价值的植物。植原体分泌的效应蛋白会引发与疾病相关的症状。识别和表征这些蛋白对于扩展我们对该疾病分子基础的认识至关重要。我们通过开发LEAPH(一种由四个模型组成的机器学习集成预测器)来应对识别植原体效应蛋白的挑战。LEAPH基于来自53种植原体物种的479种蛋白质进行训练,这些蛋白质由30个特征描述。LEAPH的准确率达到97.49%,精确率为95.26%,召回率为98.37%,确保了低误报率,并且优于现有的先进方法。将LEAPH应用于13种植原体蛋白质组,得到了2089种假定致病蛋白的全面概况。我们根据不同的分泌模型确定了三类:“经典型”、“类经典型”和“非经典型”。重要的是,LEAPH在这三类中识别出了17种已知经实验验证的效应蛋白中的15种。此外,为了帮助选择用于生物学验证的新候选蛋白,我们应用了自组织映射算法并开发了一个名为EffectorComb的Shiny应用程序。LEAPH和EffectorComb应用程序可用于在计算和实验层面促进对假定效应蛋白的表征,并可应用于其他植物病理学模型。