Lee Jaewoong, Lee Wonseok, Kim Jihan
Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea.
ACS Appl Mater Interfaces. 2024 Jan 10;16(1):723-730. doi: 10.1021/acsami.3c14781. Epub 2023 Dec 26.
We developed Material Graph Digitizer (MatGD), which is a tool for digitizing a data line from scientific graphs. The algorithm behind the tool consists of four steps: (1) identifying graphs within subfigures, (2) separating axes and data sections, (3) discerning the data lines by eliminating irrelevant graph objects and matching with the legend, and (4) data extraction and saving. From the 62,534 papers in the areas of batteries, catalysis, and metal-organic frameworks (MOFs), 501,045 figures were mined. Remarkably, our tool showcased performance with over 99% accuracy in legend marker and text detection. Moreover, its capability for data line separation stood at 66%, which is much higher compared to those of other existing figure-mining tools. We believe that this tool will be integral to collecting both past and future data from publications, and these data can be used to train various machine learning models that can enhance material predictions and new materials discovery.
我们开发了材料图形数字化工具(MatGD),这是一种用于将科学图表中的数据线数字化的工具。该工具背后的算法包括四个步骤:(1)识别子图中的图表;(2)分离坐标轴和数据部分;(3)通过消除无关的图表对象并与图例匹配来辨别数据线;(4)数据提取和保存。从电池、催化和金属有机框架(MOF)领域的62534篇论文中,挖掘出了501045个图表。值得注意的是,我们的工具在图例标记和文本检测方面展示了超过99%的准确率。此外,其数据线分离能力为66%,与其他现有的图表挖掘工具相比要高得多。我们相信,该工具对于从出版物中收集过去和未来的数据将不可或缺,并且这些数据可用于训练各种机器学习模型,从而增强材料预测和新材料发现。