Department of Computer Science Education, Universitas Pendidikan Indonesia, Bandung 40154, Indonesia.
Department of Physics Education, Universitas Pendidikan Indonesia, Bandung 40154, Indonesia.
Sensors (Basel). 2022 Jul 6;22(14):5071. doi: 10.3390/s22145071.
The phenomenon of big data has occurred in many fields of knowledge, one of which is astronomy. One example of a large dataset in astronomy is that of numerically integrated time series asteroid orbital elements from a time span of millions to billions of years. For example, the mean motion resonance (MMR) data of an asteroid are used to find out the duration that the asteroid was in a resonance state with a particular planet. For this reason, this research designs a computational model to obtain the mean motion resonance quickly and effectively by modifying and implementing the Symbolic Aggregate Approximation (SAX) algorithm and the motif discovery random projection algorithm on big data platforms (i.e., Apache Hadoop and Apache Spark). There are five following steps on the model: (i) saving data into the Hadoop Distributed File System (HDFS); (ii) importing files to the Resilient Distributed Datasets (RDD); (iii) preprocessing the data; (iv) calculating the motif discovery by executing the User-Defined Function (UDF) program; and (v) gathering the results from the UDF to the HDFS and the .csv file. The results indicated a very significant reduction in computational time between the use of the standalone method and the use of the big data platform. The proposed computational model obtained an average accuracy of 83%, compared with the SwiftVis software.
大数据现象已经出现在许多知识领域,天文学就是其中之一。天文学中一个大型数据集的例子是数值积分的小行星轨道元素时间序列,时间跨度从数百万年到数十亿年。例如,小行星的平均运动共振(MMR)数据用于找出小行星与特定行星处于共振状态的持续时间。出于这个原因,这项研究设计了一个计算模型,通过修改和在大数据平台(即 Apache Hadoop 和 Apache Spark)上实现符号聚合近似(SAX)算法和主题发现随机投影算法,快速有效地获得平均运动共振。该模型有以下五个步骤:(i)将数据保存到 Hadoop 分布式文件系统(HDFS)中;(ii)将文件导入到弹性分布式数据集(RDD)中;(iii)预处理数据;(iv)通过执行用户定义函数(UDF)程序计算主题发现;以及(v)将 UDF 的结果收集到 HDFS 和.csv 文件中。结果表明,在使用独立方法和大数据平台之间,计算时间有了显著的减少。与 SwiftVis 软件相比,所提出的计算模型的平均准确率达到了 83%。