Simidjievski Nikola, Todorovski Ljupčo, Džeroski Sašo
Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia.
Jožef Stefan International Postgraduate School, Ljubljana, Slovenia.
PLoS One. 2016 Apr 14;11(4):e0153507. doi: 10.1371/journal.pone.0153507. eCollection 2016.
Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient.
集成是一种成熟的机器学习范式,可生成准确且稳健的模型,主要应用于预测建模任务。集成模型由一组有限的不同预测模型组成,与单个模型相比,其组合输出有望产生更好的预测性能。在本文中,我们提出了一种学习动态系统基于过程模型集成的新方法。基于过程的建模范式利用特定领域知识从时间序列观测数据中自动学习动态系统模型。先前的工作表明,基于对观测数据进行采样(即装袋法和提升法)的集成显著提高了基于过程模型的预测性能。然而,这种改进是以学习所需计算时间大幅增加为代价的。为了解决这个问题,本文提出了一种方法,旨在高效地学习基于过程模型的集成,同时保持其准确的长期预测性能。这是通过用特定领域知识采样而不是数据采样来构建集成实现的。我们将所提出的方法应用于三个湖泊生态系统中一组自动预测建模问题,并使用基于过程的知识库对种群动态进行建模来评估其性能。实验结果确定了关于学习算法的最优设计决策。结果还表明,与单个基于过程的模型相比,所提出的集成对种群动态的预测要准确得多。最后,虽然它们的预测性能与使用装袋法和提升法等最先进方法获得的集成相当,但它们的效率要高得多。