IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):3799-3819. doi: 10.1109/TPAMI.2020.2992028. Epub 2021 Oct 1.
Charts are useful communication tools for the presentation of data in a visually appealing format that facilitates comprehension. There have been many studies dedicated to chart mining, which refers to the process of automatic detection, extraction and analysis of charts to reproduce the tabular data that was originally used to create them. By allowing access to data which might not be available in other formats, chart mining facilitates the creation of many downstream applications. This paper presents a comprehensive survey of approaches across all components of the automated chart mining pipeline, such as (i) automated extraction of charts from documents; (ii) processing of multi-panel charts; (iii) automatic image classifiers to collect chart images at scale; (iv) automated extraction of data from each chart image, for popular chart types as well as selected specialized classes; (v) applications of chart mining; and (vi) datasets for training and evaluation, and the methods that were used to build them. Finally, we summarize the main trends found in the literature and provide pointers to areas for further research in chart mining.
图表是一种有用的沟通工具,以吸引人的视觉格式呈现数据,便于理解。已经有许多专门针对图表挖掘的研究,图表挖掘是指自动检测、提取和分析图表以再现最初用于创建图表的表格数据的过程。通过允许访问可能在其他格式中不可用的数据,图表挖掘促进了许多下游应用的创建。本文全面调查了自动化图表挖掘管道的所有组件的方法,例如(i)从文档中自动提取图表;(ii)处理多面板图表;(iii)自动图像分类器,可大规模收集图表图像;(iv)自动从每个图表图像中提取数据,适用于流行的图表类型以及选定的专业图表类型;(v)图表挖掘的应用;以及(vi)用于训练和评估的数据集,以及用于构建它们的方法。最后,我们总结了文献中发现的主要趋势,并为图表挖掘的进一步研究提供了方向。