Patarroyo Camilo, Dupas Stéphane, Restrepo Silvia
Department of Biological Sciences Universidad de los Andes Bogotá Colombia.
Université Paris-Saclay, CNRS, IRD, UMR Évolution, Génomes, Comportement et Écologie Gif-sur-Yvette 91198 France.
Appl Plant Sci. 2024 Jul 23;12(5):e11603. doi: 10.1002/aps3.11603. eCollection 2024 Sep-Oct.
The prompt categorization of isolates into described clonal lineages is a key tool for the management of its associated disease, potato late blight. New isolates of this pathogen are currently classified by comparing their microsatellite genotypes with characterized clonal lineages, but an automated classification tool would greatly improve this process. Here, we developed a flexible machine learning-based classifier for genotypes.
The performance of different machine learning algorithms in classifying genotypes into its clonal lineages was preliminarily evaluated with decreasing amounts of training data. The four best algorithms were then evaluated using all collected genotypes.
mlpML, cforest, nnet, and AdaBag performed best in the preliminary test, correctly classifying almost 100% of the genotypes. AdaBag performed significantly better than the others when tested using the complete data set (Tukey HSD < 0.001). This algorithm was then implemented in a web application for the automated classification of genotypes, which is freely available at https://github.com/cpatarroyo/genotypeclas.
We developed a gradient boosting-based tool to automatically classify genotypes into its clonal lineages. This could become a valuable resource for the prompt identification of clonal lineages spreading into new regions.
将分离株快速分类到已描述的克隆谱系中是管理其相关病害马铃薯晚疫病的关键工具。目前,该病原体的新分离株是通过将其微卫星基因型与已鉴定的克隆谱系进行比较来分类的,但自动化分类工具将极大地改进这一过程。在此,我们开发了一种灵活的基于机器学习的基因型分类器。
使用数量不断减少的训练数据初步评估不同机器学习算法将基因型分类到其克隆谱系中的性能。然后使用所有收集到的基因型对四种最佳算法进行评估。
mlpML、cforest、nnet和AdaBag在初步测试中表现最佳,几乎能正确分类100%的基因型。在使用完整数据集进行测试时,AdaBag的表现明显优于其他算法(Tukey HSD < 0.001)。然后将该算法应用于一个网络应用程序中,用于基因型的自动分类,该程序可在https://github.com/cpatarroyo/genotypeclas上免费获取。
我们开发了一种基于梯度提升的工具,用于将基因型自动分类到其克隆谱系中。这可能成为快速识别传播到新区域的克隆谱系的宝贵资源。