Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, 19104-6021, USA.
Department of Computer Science, Ben-Gurion University, Beer Sheva, 84105, Israel.
Sci Rep. 2021 Feb 11;11(1):3629. doi: 10.1038/s41598-021-83247-4.
Conservation machine learning conserves models across runs, users, and experiments-and puts them to good use. We have previously shown the merit of this idea through a small-scale preliminary experiment, involving a single dataset source, 10 datasets, and a single so-called cultivation method-used to produce the final ensemble. In this paper, focusing on classification tasks, we perform extensive experimentation with conservation random forests, involving 5 cultivation methods (including a novel one introduced herein-lexigarden), 6 dataset sources, and 31 datasets. We show that significant improvement can be attained by making use of models we are already in possession of anyway, and envisage the possibility of repositories of models (not merely datasets, solutions, or code), which could be made available to everyone, thus having conservation live up to its name, furthering the cause of data and computational science.
保护机器学习在运行、用户和实验之间保存模型,并充分利用它们。我们之前通过一个小规模的初步实验展示了这个想法的优点,该实验涉及一个数据集来源、10 个数据集和一个单一的所谓培养方法,用于生成最终的集成。在本文中,我们专注于分类任务,使用保护随机森林进行了广泛的实验,涉及 5 种培养方法(包括本文介绍的一种新方法——lexigarden)、6 个数据集来源和 31 个数据集。我们表明,通过利用我们已经拥有的模型,可以获得显著的改进,并设想了模型存储库(不仅仅是数据集、解决方案或代码)的可能性,这些存储库可以提供给所有人,从而使保护名副其实,进一步推动数据和计算科学的发展。