Suppr超能文献

机器学习应用与聚类方法优化改善黑莓种质库中描述符的选择

Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks.

作者信息

Henao-Rojas Juan Camilo, Rosero-Alpala María Gladis, Ortiz-Muñoz Carolina, Velásquez-Arroyo Carlos Enrique, Leon-Rueda William Alfonso, Ramírez-Gil Joaquín Guillermo

机构信息

Corporación Colombiana de Investigación Agropecuaria-AGROSAVIA, Centro de Investigación La Selva- Km 7, 250047 Ríonegro, Colombia.

Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, 111321 Sede Bogotá, Colombia.

出版信息

Plants (Basel). 2021 Jan 28;10(2):247. doi: 10.3390/plants10020247.

Abstract

Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective of this research was to test and optimize different analysis methods based on ML for the prioritization and selection of morphological descriptors of spp. 55 descriptors were evaluated in 26 genotypes and the weight of each one and its ability to discriminating capacity was determined. ML methods as random forest (RF), support vector machines, in the linear and radial forms, and neural networks were optimized and compared. Subsequently, the results were validated with two discriminating methods and their variants: hierarchical agglomerative clustering and K-means. The results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance ( value < 0.05). Additionally, K-means method with optimized descriptors based on RF had greater discriminating power on spp., accessions according to evaluated statistics. This study presents one application of ML for the optimization of specific morphological variables for plant germplasm bank characterization.

摘要

机器学习(ML)及其多种应用在改进对不同农业过程知识的解读方面具有比较优势。然而,存在一些阻碍其正确使用的挑战,这在种质库的表型特征描述中可见一斑。本研究的目的是测试和优化基于机器学习的不同分析方法,用于对 spp. 的形态学描述符进行优先级排序和选择。在26个基因型中评估了55个描述符,并确定了每个描述符的权重及其鉴别能力。对随机森林(RF)、线性和径向形式的支持向量机以及神经网络等机器学习方法进行了优化和比较。随后,用两种鉴别方法及其变体(层次凝聚聚类和K均值)对结果进行了验证。结果表明,在评估的方法中,随机森林的准确率最高(0.768),根据纯度(基尼指数)、重要性、连接树的数量和显著性( 值<0.05)选择了11个描述符。此外,基于随机森林优化描述符的K均值方法对 spp. 的种质根据评估统计数据具有更大的鉴别力。本研究展示了机器学习在优化用于植物种质库特征描述的特定形态变量方面的一种应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2786/7911707/4afe960d9c83/plants-10-00247-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验