Chen Jian, Ling Meng, Li Rui, Isenberg Petra, Isenberg Tobias, Sedlmair Michael, Moller Torsten, Laramee Robert S, Shen Han-Wei, Wunsche Katharina, Wang Qiru
IEEE Trans Vis Comput Graph. 2021 Sep;27(9):3826-3833. doi: 10.1109/TVCG.2021.3054916. Epub 2021 Jul 29.
We present the VIS30K dataset, a collection of 29,689 images that represents 30 years of figures and tables from each track of the IEEE Visualization conference series (Vis, SciVis, InfoVis, VAST). VIS30K's comprehensive coverage of the scientific literature in visualization not only reflects the progress of the field but also enables researchers to study the evolution of the state-of-the-art and to find relevant work based on graphical content. We describe the dataset and our semi-automatic collection process, which couples convolutional neural networks (CNN) with curation. Extracting figures and tables semi-automatically allows us to verify that no images are overlooked or extracted erroneously. To improve quality further, we engaged in a peer-search process for high-quality figures from early IEEE Visualization papers. With the resulting data, we also contribute VISImageNavigator (VIN, visimagenavigator.github.io), a web-based tool that facilitates searching and exploring VIS30K by author names, paper keywords, title and abstract, and years.
我们展示了VIS30K数据集,它包含29689幅图像,代表了IEEE可视化会议系列(Vis、SciVis、InfoVis、VAST)各领域30年的图表。VIS30K对可视化科学文献的全面覆盖不仅反映了该领域的进展,还使研究人员能够研究最先进技术的演变,并根据图形内容找到相关工作。我们描述了该数据集及其半自动收集过程,该过程将卷积神经网络(CNN)与人工筛选相结合。半自动提取图表使我们能够验证没有图像被遗漏或错误提取。为了进一步提高质量,我们对早期IEEE可视化论文中的高质量图表进行了同行搜索。利用所得数据,我们还贡献了VISImageNavigator(VIN,visimagenavigator.github.io),这是一个基于网络的工具,便于通过作者姓名、论文关键词、标题和摘要以及年份来搜索和探索VIS30K。