多水平机器学习分类器比较及其性能指标。

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics.

机构信息

Plasma Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, H-1117 Budapest, Hungary.

Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, H-1117 Budapest, Hungary.

出版信息

Molecules. 2019 Aug 1;24(15):2811. doi: 10.3390/molecules24152811.

DOI:10.3390/molecules24152811

PMID:31374986

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6695655/

Abstract

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

摘要

机器学习分类算法被广泛应用于预测和分类分子的不同性质，如毒性或生物活性。由于对活体动物进行测试，预测有毒和无毒分子具有伦理和成本方面的缺点。分类模型的质量可以通过几个性能参数来确定，但这些参数往往会给出相互矛盾的结果。在这项研究中，我们使用不同的性能指标和机器学习分类方法进行了多层次的比较。在每种情况下，都使用了经过良好验证和标准化的机器学习任务协议。该比较应用于三个数据集（急性毒性和水生毒性），并应用稳健但敏感的排序差异总和（SRD）和方差分析（ANOVA）进行评估。还研究了数据集组成（平衡与不平衡）和 2 类与多类分类场景的影响。大多数性能指标对数据集的组成非常敏感，尤其是在 2 类分类问题中。最佳的机器学习算法也严重依赖于数据集的组成。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

多水平机器学习分类器比较及其性能指标。

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

多水平机器学习分类器比较及其性能指标。

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献