Nguyen An, Huang Yu-Chieh, Tremouilhac Pierre, Jung Nicole, Bräse Stefan
Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany.
Institute of Organic Chemistry, Karlsruhe Institute of Technology, Fritz-Haber-Weg 6, 76131, Karlsruhe, Germany.
J Cheminform. 2019 Dec 11;11(1):77. doi: 10.1186/s13321-019-0400-5.
We developed CHEMSCANNER, a software that can be used for the extraction of chemical information from ChemDraw binary (CDX) or ChemDraw XML-based (CDXML) files and to retrieve the ChemDraw scheme from DOC, DOCX or XML documents. This can facilitate the reuse of chemical information embedded into diverse documents used as standard storage and communication instrument in chemical sciences (e.g. for student's theses, PhD theses, or publications). The extracted information is processed to reactions, molecules, as well as additional text and values and can be accessed via the CHEMSCANNER UI. CHEMSCANNER supports the export to Excel and CML, the direct import of the extracted data to the Open Source ELN Chemotion or the use via "copy and paste" of selected information. The software was designed with a focus on the processing of documents with embedded molecular structure information as CDX or CDXML as these are the most common file formats for chemical drawings. The project aims to support the chemists in their efforts to re-use chemistry research data by providing them missing tools for an automated assembly of reaction data.
我们开发了CHEMSCANNER软件,它可用于从ChemDraw二进制(CDX)或基于ChemDraw XML(CDXML)的文件中提取化学信息,并从DOC、DOCX或XML文档中检索ChemDraw方案。这有助于重新利用嵌入到化学科学中用作标准存储和交流工具的各种文档中的化学信息(例如学生论文、博士论文或出版物)。提取的信息被处理为反应、分子以及其他文本和值,并可通过CHEMSCANNER用户界面访问。CHEMSCANNER支持导出到Excel和CML,将提取的数据直接导入开源电子实验室笔记软件Chemotion或通过“复制粘贴”选定信息来使用。该软件的设计重点是处理具有嵌入分子结构信息(如CDX或CDXML)的文档,因为这些是化学绘图最常见的文件格式。该项目旨在通过为化学家提供自动组装反应数据所需的缺失工具,支持他们重新利用化学研究数据的努力。