IEEE Trans Vis Comput Graph. 2021 Feb;27(2):1731-1741. doi: 10.1109/TVCG.2020.3030443. Epub 2021 Jan 28.
Most visual analytics systems assume that all foraging for data happens before the analytics process; once analysis begins, the set of data attributes considered is fixed. Such separation of data construction from analysis precludes iteration that can enable foraging informed by the needs that arise in-situ during the analysis. The separation of the foraging loop from the data analysis tasks can limit the pace and scope of analysis. In this paper, we present CAVA, a system that integrates data curation and data augmentation with the traditional data exploration and analysis tasks, enabling information foraging in-situ during analysis. Identifying attributes to add to the dataset is difficult because it requires human knowledge to determine which available attributes will be helpful for the ensuing analytical tasks. CAVA crawls knowledge graphs to provide users with a a broad set of attributes drawn from external data to choose from. Users can then specify complex operations on knowledge graphs to construct additional attributes. CAVA shows how visual analytics can help users forage for attributes by letting users visually explore the set of available data, and by serving as an interface for query construction. It also provides visualizations of the knowledge graph itself to help users understand complex joins such as multi-hop aggregations. We assess the ability of our system to enable users to perform complex data combinations without programming in a user study over two datasets. We then demonstrate the generalizability of CAVA through two additional usage scenarios. The results of the evaluation confirm that CAVA is effective in helping the user perform data foraging that leads to improved analysis outcomes, and offer evidence in support of integrating data augmentation as a part of the visual analytics pipeline.
大多数视觉分析系统都假设所有的数据采集工作都是在分析过程之前进行的;一旦开始分析,所考虑的数据属性集就固定了。这种将数据构建与分析分离的方式排除了可以根据分析过程中就地出现的需求进行的迭代。将采集循环与数据分析任务分离可能会限制分析的速度和范围。在本文中,我们提出了 CAVA,这是一个将数据整理和数据增强与传统的数据探索和分析任务集成在一起的系统,使信息能够在分析过程中就地采集。确定要添加到数据集的属性是困难的,因为它需要人类知识来确定哪些可用属性将有助于随后的分析任务。CAVA 会爬取知识图谱,为用户提供一套广泛的来自外部数据的属性供其选择。然后,用户可以指定在知识图谱上执行复杂操作来构建其他属性。CAVA 展示了视觉分析如何通过让用户直观地探索可用数据的集合,并作为查询构建的接口,帮助用户采集属性。它还提供了知识图谱本身的可视化,以帮助用户理解复杂的连接,例如多步聚合。我们在两个数据集上的用户研究中评估了我们的系统在无需编程的情况下帮助用户执行复杂数据组合的能力。然后,我们通过两个额外的使用场景展示了 CAVA 的通用性。评估结果证实,CAVA 有助于用户进行数据采集,从而改善分析结果,并提供了将数据增强作为视觉分析管道的一部分集成的证据支持。