Nural Mustafa V, Cotterell Michael E, Peng Hao, Xie Rui, Ma Ping, Miller John A
Department of Computer Science, Statistics University of Georgia, Athens.
Department of Statistics University of Georgia, Athens.
Int J Big Data. 2015 Oct;2(2):43-56. doi: 10.29268/stbd.2015.2.2.4.
Predictive analytics in the big data era is taking on an ever increasingly important role. Issues related to choice on modeling technique, estimation procedure (or algorithm) and efficient execution can present significant challenges. For example, selection of appropriate and optimal models for big data analytics often requires careful investigation and considerable expertise which might not always be readily available. In this paper, we propose to use semantic technology to assist data analysts and data scientists in selecting appropriate modeling techniques and building specific models as well as the rationale for the techniques and models selected. To formally describe the modeling techniques, models and results, we developed the Analytics Ontology that supports inferencing for semi-automated model selection. The SCALATION framework, which currently supports over thirty modeling techniques for predictive big data analytics is used as a testbed for evaluating the use of semantic technology.
大数据时代的预测分析正发挥着越来越重要的作用。与建模技术选择、估计程序(或算法)以及高效执行相关的问题可能带来重大挑战。例如,为大数据分析选择合适且最优的模型通常需要仔细研究和相当多的专业知识,而这些可能并非总是唾手可得。在本文中,我们提议使用语义技术来协助数据分析师和数据科学家选择合适的建模技术、构建特定模型以及所选技术和模型的基本原理。为了正式描述建模技术、模型和结果,我们开发了支持半自动模型选择推理的分析本体。目前支持三十多种预测性大数据分析建模技术的SCALATION框架被用作评估语义技术使用情况的测试平台。