Hazarika Subhashis, Biswas Ayan, Dutta Soumya, Shen Han-Wei
GRAVITY Research Group, Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43218-2646, USA.
Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
Entropy (Basel). 2018 Jul 20;20(7):540. doi: 10.3390/e20070540.
Uncertainty of scalar values in an ensemble dataset is often represented by the collection of their corresponding isocontours. Various techniques such as contour-boxplot, contour variability plot, glyphs and probabilistic marching-cubes have been proposed to analyze and visualize ensemble isocontours. All these techniques assume that a scalar value of interest is already known to the user. Not much work has been done in guiding users to select the scalar values for such uncertainty analysis. Moreover, analyzing and visualizing a large collection of ensemble isocontours for a selected scalar value has its own challenges. Interpreting the visualizations of such large collections of isocontours is also a difficult task. In this work, we propose a new information-theoretic approach towards addressing these issues. Using specific information measures that estimate the predictability and surprise of specific scalar values, we evaluate the overall uncertainty associated with all the scalar values in an ensemble system. This helps the scientist to understand the effects of uncertainty on different data features. To understand in finer details the contribution of individual members towards the uncertainty of the ensemble isocontours of a selected scalar value, we propose a conditional entropy based algorithm to quantify the individual contributions. This can help simplify analysis and visualization for systems with more members by identifying the members contributing the most towards overall uncertainty. We demonstrate the efficacy of our method by applying it on real-world datasets from material sciences, weather forecasting and ocean simulation experiments.
在一个集合数据集中,标量值的不确定性通常由其相应等值线的集合来表示。已经提出了各种技术,如等值线箱线图、等值线变异性图、符号和概率行进立方体,来分析和可视化集合等值线。所有这些技术都假设用户已经知道感兴趣的标量值。在指导用户选择用于此类不确定性分析的标量值方面,所做的工作并不多。此外,针对选定的标量值分析和可视化大量的集合等值线也有其自身的挑战。解释如此大量的等值线集合的可视化也是一项艰巨的任务。在这项工作中,我们提出了一种新的信息论方法来解决这些问题。通过使用估计特定标量值的可预测性和意外性的特定信息度量,我们评估与集合系统中所有标量值相关的总体不确定性。这有助于科学家理解不确定性对不同数据特征的影响。为了更详细地了解单个成员对选定标量值的集合等值线不确定性的贡献,我们提出了一种基于条件熵的算法来量化个体贡献。通过识别对总体不确定性贡献最大的成员,这有助于简化对具有更多成员的系统的分析和可视化。我们通过将我们的方法应用于材料科学、天气预报和海洋模拟实验的实际数据集来证明我们方法的有效性。