Liu Qi, Cai Weidong, Jin Dandan, Shen Jian, Fu Zhangjie, Liu Xiaodong, Linge Nigel
Jiangsu Collaborative Innovation Centre of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing 210044, China.
School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing 210044, China.
Sensors (Basel). 2016 Aug 30;16(9):1386. doi: 10.3390/s16091386.
Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, processing and mining of collected large-scale data, where speculative execution strategies help solve straggler problems. However, there is still no efficient solution for accurate estimation on execution time of run-time tasks, which can affect task allocation and distribution in MapReduce. In this paper, task execution data have been collected and employed for the estimation. A two-phase regression (TPR) method is proposed to predict the finishing time of each task accurately. Detailed data of each task have drawn interests with detailed analysis report being made. According to the results, the prediction accuracy of concurrent tasks' execution time can be improved, in particular for some regular jobs.
自2006年云计算被提出以来,分布式计算取得了巨大发展,并在促进数据收集和分析模型(如物联网、信息物理系统、大数据分析等)的快速增长方面发挥了至关重要的作用。Hadoop已成为传感器网络的数据融合平台。作为核心组件之一,MapReduce有助于对收集到的大规模数据进行分配、处理和挖掘,其中推测执行策略有助于解决掉队者问题。然而,对于运行时任务的执行时间仍没有有效的准确估计解决方案,这可能会影响MapReduce中的任务分配和分布。在本文中,收集了任务执行数据并用于估计。提出了一种两阶段回归(TPR)方法来准确预测每个任务的完成时间。每个任务的详细数据通过详细的分析报告引起了关注。根据结果,可以提高并发任务执行时间的预测准确性,特别是对于一些常规作业。