Oppermann Michael, Kincaid Robert, Munzner Tamara
IEEE Trans Vis Comput Graph. 2021 Feb;27(2):495-505. doi: 10.1109/TVCG.2020.3030387. Epub 2021 Jan 28.
Cloud-based visualization services have made visual analytics accessible to a much wider audience than ever before. Systems such as Tableau have started to amass increasingly large repositories of analytical knowledge in the form of interactive visualization workbooks. When shared, these collections can form a visual analytic knowledge base. However, as the size of a collection increases, so does the difficulty in finding relevant information. Content-based recommendation (CBR) systems could help analysts in finding and managing workbooks relevant to their interests. Toward this goal, we focus on text-based content that is representative of the subject matter of visualizations rather than the visual encodings and style. We discuss the challenges associated with creating a CBR based on visualization specifications and explore more concretely how to implement the relevance measures required using Tableau workbook specifications as the source of content data. We also demonstrate what information can be extracted from these visualization specifications and how various natural language processing techniques can be used to compute similarity between workbooks as one way to measure relevance. We report on a crowd-sourced user study to determine if our similarity measure mimics human judgement. Finally, we choose latent Dirichl et al.ocation (LDA) as a specific model and instantiate it in a proof-of-concept recommender tool to demonstrate the basic function of our similarity measure.
基于云的可视化服务使视觉分析能够被比以往更广泛的受众使用。诸如Tableau之类的系统已开始以交互式可视化工作簿的形式积累越来越大的分析知识库。当这些集合被共享时,它们可以形成一个视觉分析知识库。然而,随着集合规模的增加,查找相关信息的难度也会增加。基于内容的推荐(CBR)系统可以帮助分析师查找和管理与其兴趣相关的工作簿。为了实现这一目标,我们专注于基于文本的内容,这些内容代表可视化的主题,而不是视觉编码和样式。我们讨论了基于可视化规范创建CBR所面临的挑战,并更具体地探讨如何使用Tableau工作簿规范作为内容数据的来源来实现所需的相关性度量。我们还展示了可以从这些可视化规范中提取哪些信息,以及如何使用各种自然语言处理技术来计算工作簿之间的相似度,以此作为衡量相关性的一种方法。我们报告了一项众包用户研究,以确定我们的相似度度量是否模拟了人类判断。最后,我们选择潜在狄利克雷分布(LDA)作为一个具体模型,并在一个概念验证推荐工具中实例化它,以展示我们相似度度量的基本功能。