Dutta Soumya, Biswas Ayan, Ahrens James
Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
Entropy (Basel). 2019 Jul 16;21(7):699. doi: 10.3390/e21070699.
With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific data sets.
随着现代超级计算机计算能力的不断提高,科学模拟产生的数据规模正在迅速增长。因此,应用科学家需要有效的数据汇总技术,这些技术可以减少大规模多变量时空数据集,同时保留重要的数据属性,以便精简后的数据能够足够准确地回答涉及多个变量的特定领域查询。在分析复杂的科学事件时,领域专家经常一起分析和可视化两个或更多变量,以便更好地理解数据特征的特性。因此,需要数据汇总技术来详细分析多变量关系,然后进行数据精简,以便在精简后的数据中保留涉及多个变量的重要特征。为了实现这一目标,在这项工作中,我们提出了一种用于执行统计数据汇总的数据子采样算法,该算法利用逐点信息理论度量来量化考虑多个变量的数据点的统计关联,并生成保留多变量之间统计关联的子采样数据。使用这样精简后的采样数据,我们表明可以有效地进行多变量特征查询和分析。通过将所提出的多变量关联驱动采样算法应用于几个科学数据集,展示了该算法的有效性。