Suppr超能文献

Data2Vis:使用序列到序列循环神经网络自动生成数据可视化

Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks.

作者信息

Dibia Victor, Demiralp Cagatay

出版信息

IEEE Comput Graph Appl. 2019 Sep-Oct;39(5):33-46. doi: 10.1109/MCG.2019.2924636. Epub 2019 Jun 24.

Abstract

Rapidly creating effective visualizations using expressive grammars is challenging for users who have limited time and limited skills in statistics and data visualization. Even high-level, dedicated visualization tools often require users to manually select among data attributes, decide which transformations to apply, and specify mappings between visual encoding variables and raw or transformed attributes. In this paper we introduce Data2Vis, an end-to-end trainable neural translation model for automatically generating visualizations from given datasets. We formulate visualization generation as a language translation problem, where data specifications are mapped to visualization specifications in a declarative language (Vega-Lite). To this end, we train a multilayered attention-based encoder-decoder network with long short-term memory (LSTM) units on a corpus of visualization specifications. Qualitative results show that our model learns the vocabulary and syntax for a valid visualization specification, appropriate transformations (count, bins, mean), and how to use common data selection patterns that occur within data visualizations. We introduce two metrics for evaluating the task of automated visualization generation (language syntax validity, visualization grammar syntax validity) and demonstrate the efficacy of bidirectional models with attention mechanisms for this task. Data2Vis generates visualizations that are comparable to manually created visualizations in a fraction of the time, with potential to learn more complex visualization strategies at scale.

摘要

对于那些时间有限且统计和数据可视化技能有限的用户来说,使用表达性语法快速创建有效的可视化效果具有挑战性。即使是高级的专用可视化工具,通常也要求用户手动在数据属性中进行选择,决定应用哪些转换,并指定视觉编码变量与原始或转换后属性之间的映射。在本文中,我们介绍了Data2Vis,这是一种端到端可训练的神经翻译模型,用于从给定的数据集中自动生成可视化效果。我们将可视化生成表述为一个语言翻译问题,其中数据规范被映射为一种声明性语言(Vega-Lite)中的可视化规范。为此,我们在一个可视化规范语料库上训练了一个带有长短期记忆(LSTM)单元的基于多层注意力的编码器-解码器网络。定性结果表明,我们的模型学习到了有效可视化规范的词汇和语法、适当的转换(计数、分箱、均值),以及如何使用数据可视化中出现的常见数据选择模式。我们引入了两个指标来评估自动可视化生成任务(语言语法有效性、可视化语法有效性),并展示了具有注意力机制的双向模型在该任务中的有效性。Data2Vis能够在极短的时间内生成与手动创建的可视化效果相当的可视化,并且有潜力大规模学习更复杂的可视化策略。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验