Bajić Filip, Habijan Marija, Nenadić Krešimir
University Computing Centre, University of Zagreb, 10000 Zagreb, Croatia.
Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, 31000 Osijek, Croatia.
Data Brief. 2024 Feb 21;53:110233. doi: 10.1016/j.dib.2024.110233. eCollection 2024 Apr.
We introduce a meticulously curated synthetic chart dataset designed to propel algorithm advancements in data visualization and interpretation. The dataset, tailored for training and testing purposes, encompasses a diverse array of chart types, including but not limited to Area, Bar, Box, Donut, Line, Pie, and Scatter. The data collection process involves a fully automatic low-level algorithm focused on extraction of graphical elements. The algorithm ensures efficiency by restricting input images from featuring three-dimensional representations, incorporating any 3D effects, or including multiple charts in a single image. The dataset is categorized into training and testing subsets, further subdivided based on resolutions and specific chart types. The reuse potential of this dataset is substantial. It serves as a valuable resource for driving algorithmic advancements in data visualization classification and interpretation. Researchers can leverage this dataset for training and testing deep models, enhancing the adaptability of their algorithms. Moreover, it establishes a benchmark for evaluating system performance in handling diverse chart visualizations, allowing for direct comparisons, and fostering advancements in data understanding algorithms. The versatility of the dataset, encapsulating various chart types and resolutions, provides a standardized platform for assessing and comparing the effectiveness of different systems in understanding and decomposing visualizations [1,2,3].
我们引入了一个精心策划的合成图表数据集,旨在推动数据可视化和解释方面的算法进步。该数据集专为训练和测试目的量身定制,涵盖了各种图表类型,包括但不限于面积图、柱状图、箱线图、甜甜圈图、折线图、饼图和散点图。数据收集过程涉及一种全自动的低级算法,专注于图形元素的提取。该算法通过限制输入图像不具有三维表示、不包含任何3D效果或在单个图像中不包含多个图表来确保效率。该数据集被分类为训练子集和测试子集,并根据分辨率和特定图表类型进一步细分。这个数据集的重用潜力很大。它是推动数据可视化分类和解释方面算法进步的宝贵资源。研究人员可以利用这个数据集来训练和测试深度模型,提高其算法的适应性。此外,它还为评估系统处理各种图表可视化的性能建立了一个基准,允许进行直接比较,并促进数据理解算法的进步。该数据集的通用性,涵盖了各种图表类型和分辨率,为评估和比较不同系统在理解和分解可视化方面的有效性提供了一个标准化平台[1,2,3]。