Suppr超能文献

一种用于将基因型自动分类为克隆谱系的机器学习算法。

A machine learning algorithm for the automatic classification of genotypes into clonal lineages.

作者信息

Patarroyo Camilo, Dupas Stéphane, Restrepo Silvia

机构信息

Department of Biological Sciences Universidad de los Andes Bogotá Colombia.

Université Paris-Saclay, CNRS, IRD, UMR Évolution, Génomes, Comportement et Écologie Gif-sur-Yvette 91198 France.

出版信息

Appl Plant Sci. 2024 Jul 23;12(5):e11603. doi: 10.1002/aps3.11603. eCollection 2024 Sep-Oct.

Abstract

PREMISE

The prompt categorization of isolates into described clonal lineages is a key tool for the management of its associated disease, potato late blight. New isolates of this pathogen are currently classified by comparing their microsatellite genotypes with characterized clonal lineages, but an automated classification tool would greatly improve this process. Here, we developed a flexible machine learning-based classifier for genotypes.

METHODS

The performance of different machine learning algorithms in classifying genotypes into its clonal lineages was preliminarily evaluated with decreasing amounts of training data. The four best algorithms were then evaluated using all collected genotypes.

RESULTS

mlpML, cforest, nnet, and AdaBag performed best in the preliminary test, correctly classifying almost 100% of the genotypes. AdaBag performed significantly better than the others when tested using the complete data set (Tukey HSD  < 0.001). This algorithm was then implemented in a web application for the automated classification of genotypes, which is freely available at https://github.com/cpatarroyo/genotypeclas.

DISCUSSION

We developed a gradient boosting-based tool to automatically classify genotypes into its clonal lineages. This could become a valuable resource for the prompt identification of clonal lineages spreading into new regions.

摘要

前提

将分离株快速分类到已描述的克隆谱系中是管理其相关病害马铃薯晚疫病的关键工具。目前,该病原体的新分离株是通过将其微卫星基因型与已鉴定的克隆谱系进行比较来分类的,但自动化分类工具将极大地改进这一过程。在此,我们开发了一种灵活的基于机器学习的基因型分类器。

方法

使用数量不断减少的训练数据初步评估不同机器学习算法将基因型分类到其克隆谱系中的性能。然后使用所有收集到的基因型对四种最佳算法进行评估。

结果

mlpML、cforest、nnet和AdaBag在初步测试中表现最佳,几乎能正确分类100%的基因型。在使用完整数据集进行测试时,AdaBag的表现明显优于其他算法(Tukey HSD  < 0.001)。然后将该算法应用于一个网络应用程序中,用于基因型的自动分类,该程序可在https://github.com/cpatarroyo/genotypeclas上免费获取。

讨论

我们开发了一种基于梯度提升的工具,用于将基因型自动分类到其克隆谱系中。这可能成为快速识别传播到新区域的克隆谱系的宝贵资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f91/11443441/c310439cb366/APS3-12-e11603-g004.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验