Maceachren Alan, Dai Xiping, Hardisty Frank, Guo Diansheng, Lengerich Gene
GeoVISTA Center, Department of Geography, The Pennsylvania State University, University Park, PA 16802.
IEEE Conf Inf Vis. 2003:31-38. doi: 10.1109/INFVIS.2003.1249006.
We introduce an approach to visual analysis of multivariate data that integrates several methods from information visualization, exploratory data analysis (EDA), and geovisualization. The approach leverages the component-based architecture implemented in GeoVISTA Studio to construct a flexible, multiview, tightly (but generically) coordinated, EDA toolkit. This toolkit builds upon traditional ideas behind both small multiples and scatterplot matrices in three fundamental ways. First, we develop a general, MultiForm, Bivariate Matrix and a complementary MultiForm, Bivariate Small Multiple plot in which different bivariate representation forms can be used in combination. We demonstrate the flexibility of this approach with matrices and small multiples that depict multivariate data through combinations of: scatterplots, bivariate maps, and space-filling displays. Second, we apply a measure of conditional entropy to (a) identify variables from a high-dimensional data set that are likely to display interesting relationships and (b) generate a default order of these variables in the matrix or small multiple display. Third, we add conditioning, a kind of dynamic query/filtering in which supplementary (undisplayed) variables are used to constrain the view onto variables that are displayed. Conditioning allows the effects of one or more well understood variables to be removed from the analysis, making relationships among remaining variables easier to explore. We illustrate the individual and combined functionality enabled by this approach through application to analysis of cancer diagnosis and mortality data and their associated covariates and risk factors.
我们介绍了一种多变量数据可视化分析方法,该方法整合了信息可视化、探索性数据分析(EDA)和地理可视化中的多种方法。该方法利用GeoVISTA Studio中实现的基于组件的架构构建了一个灵活的、多视图的、紧密(但通用)协调的EDA工具包。这个工具包在小多重图和散点图矩阵背后的传统思想基础上,从三个基本方面进行了改进。首先,我们开发了一种通用的多形式双变量矩阵和一种互补的多形式双变量小多重图,其中可以组合使用不同的双变量表示形式。我们通过矩阵和小多重图展示了这种方法的灵活性,这些矩阵和小多重图通过散点图、双变量地图和空间填充显示的组合来描绘多变量数据。其次,我们应用条件熵度量来(a)从高维数据集中识别可能显示有趣关系的变量,以及(b)在矩阵或小多重显示中生成这些变量的默认顺序。第三,我们添加了条件设置,这是一种动态查询/过滤,其中使用补充(未显示)变量来约束对显示变量的视图。条件设置允许从分析中去除一个或多个已充分理解的变量的影响,使其余变量之间的关系更容易探索。我们通过将该方法应用于癌症诊断和死亡率数据及其相关协变量和风险因素的分析,说明了该方法的单独和组合功能。