• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DECIMER-分割:从科学文献中自动提取化学结构描绘。

DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature.

作者信息

Rajan Kohulan, Brinkhaus Henning Otto, Sorokina Maria, Zielesny Achim, Steinbeck Christoph

机构信息

Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Lessingstr. 8, 07743, Jena, Germany.

Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665, Recklinghausen, Germany.

出版信息

J Cheminform. 2021 Mar 8;13(1):20. doi: 10.1186/s13321-021-00496-1.

DOI:10.1186/s13321-021-00496-1
PMID:33685498
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7941967/
Abstract

Chemistry looks back at many decades of publications on chemical compounds, their structures and properties, in scientific articles. Liberating this knowledge (semi-)automatically and making it available to the world in open-access databases is a current challenge. Apart from mining textual information, Optical Chemical Structure Recognition (OCSR), the translation of an image of a chemical structure into a machine-readable representation, is part of this workflow. As the OCSR process requires an image containing a chemical structure, there is a need for a publicly available tool that automatically recognizes and segments chemical structure depictions from scientific publications. This is especially important for older documents which are only available as scanned pages. Here, we present DECIMER (Deep lEarning for Chemical IMagE Recognition) Segmentation, the first open-source, deep learning-based tool for automated recognition and segmentation of chemical structures from the scientific literature. The workflow is divided into two main stages. During the detection step, a deep learning model recognizes chemical structure depictions and creates masks which define their positions on the input page. Subsequently, potentially incomplete masks are expanded in a post-processing workflow. The performance of DECIMER Segmentation has been manually evaluated on three sets of publications from different publishers. The approach operates on bitmap images of journal pages to be applicable also to older articles before the introduction of vector images in PDFs. By making the source code and the trained model publicly available, we hope to contribute to the development of comprehensive chemical data extraction workflows. In order to facilitate access to DECIMER Segmentation, we also developed a web application. The web application, available at https://decimer.ai , lets the user upload a pdf file and retrieve the segmented structure depictions.

摘要

化学领域回顾了科学文献中关于化合物及其结构和性质的数十年出版物。以(半)自动方式解放这些知识并使其在开放获取数据库中供全世界使用是当前的一项挑战。除了挖掘文本信息外,光学化学结构识别(OCSR),即将化学结构图像转换为机器可读表示形式,也是此工作流程的一部分。由于OCSR过程需要包含化学结构的图像,因此需要一个可公开获取的工具,该工具能够自动从科学出版物中识别和分割化学结构描绘。这对于仅以扫描页面形式提供的旧文档尤为重要。在此,我们展示了DECIMER(用于化学图像识别的深度学习)分割工具,这是首个基于深度学习的开源工具,用于从科学文献中自动识别和分割化学结构。该工作流程分为两个主要阶段。在检测步骤中,深度学习模型识别化学结构描绘并创建掩码,这些掩码定义了它们在输入页面上的位置。随后,在后期处理工作流程中扩展可能不完整的掩码。已在来自不同出版商的三组出版物上手动评估了DECIMER分割工具的性能。该方法对期刊页面的位图图像进行操作,以便也适用于PDF中引入矢量图像之前的旧文章。通过公开提供源代码和训练模型,我们希望为全面的化学数据提取工作流程的发展做出贡献。为了便于访问DECIMER分割工具,我们还开发了一个网络应用程序。该网络应用程序可在https://decimer.ai上获取,用户可以上传pdf文件并检索分割后的结构描绘。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/8eac0b9d12c3/13321_2021_496_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/4c7d6794ece4/13321_2021_496_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/b4102ed8f5ad/13321_2021_496_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/18aac8de1152/13321_2021_496_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/aab6fee8cfd7/13321_2021_496_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/8eac0b9d12c3/13321_2021_496_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/4c7d6794ece4/13321_2021_496_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/b4102ed8f5ad/13321_2021_496_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/18aac8de1152/13321_2021_496_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/aab6fee8cfd7/13321_2021_496_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0023/7941967/8eac0b9d12c3/13321_2021_496_Fig5_HTML.jpg

相似文献

1
DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature.DECIMER-分割:从科学文献中自动提取化学结构描绘。
J Cheminform. 2021 Mar 8;13(1):20. doi: 10.1186/s13321-021-00496-1.
2
DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.DECIMER.ai:一个用于科学出版物中光学化学结构自动识别、分割和识别的开放平台。
Nat Commun. 2023 Aug 19;14(1):5045. doi: 10.1038/s41467-023-40782-0.
3
DECIMER 1.0: deep learning for chemical image recognition using transformers.DECIMER 1.0:使用Transformer进行化学图像识别的深度学习
J Cheminform. 2021 Aug 17;13(1):61. doi: 10.1186/s13321-021-00538-8.
4
A review of optical chemical structure recognition tools.光学化学结构识别工具综述。
J Cheminform. 2020 Oct 7;12(1):60. doi: 10.1186/s13321-020-00465-0.
5
DECIMER: towards deep learning for chemical image recognition.DECIMER:迈向用于化学图像识别的深度学习
J Cheminform. 2020 Oct 27;12(1):65. doi: 10.1186/s13321-020-00469-w.
6
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture.通过增强的DECIMER架构实现手绘化学结构识别的进展。
J Cheminform. 2024 Jul 5;16(1):78. doi: 10.1186/s13321-024-00872-7.
7
DECIMER-hand-drawn molecule images dataset.DECIMER 手绘分子图像数据集。
J Cheminform. 2022 Jun 9;14(1):36. doi: 10.1186/s13321-022-00620-9.
8
Automated molecular structure segmentation from documents using ChemSAM.使用ChemSAM从文档中自动进行分子结构分割。
J Cheminform. 2024 Mar 12;16(1):29. doi: 10.1186/s13321-024-00823-2.
9
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications.YoDe分割:从科学出版物中自动无噪声检索分子结构。
J Cheminform. 2023 Nov 20;15(1):111. doi: 10.1186/s13321-023-00783-z.
10
MolMiner: You Only Look Once for Chemical Structure Recognition.MolMiner:只看一次的化学结构识别。
J Chem Inf Model. 2022 Nov 28;62(22):5321-5328. doi: 10.1021/acs.jcim.2c00733. Epub 2022 Sep 15.

引用本文的文献

1
Role of Artificial Intelligence in Drug Discovery to Revolutionize the Pharmaceutical Industry: Resources, Methods and Applications.人工智能在药物发现中对制药行业进行变革的作用:资源、方法与应用
Recent Pat Biotechnol. 2025;19(1):35-52. doi: 10.2174/0118722083297406240313090140.
2
Revealing Chemical Trends: Insights from Data-Driven Visualization and Patent Analysis in Exposomics Research.揭示化学趋势:暴露组学研究中数据驱动可视化和专利分析的见解
Environ Sci Technol Lett. 2024 Aug 30;11(10):1046-1052. doi: 10.1021/acs.estlett.4c00560. eCollection 2024 Oct 8.
3
Automation and machine learning augmented by large language models in a catalysis study.

本文引用的文献

1
A review of optical chemical structure recognition tools.光学化学结构识别工具综述。
J Cheminform. 2020 Oct 7;12(1):60. doi: 10.1186/s13321-020-00465-0.
2
DECIMER: towards deep learning for chemical image recognition.DECIMER:迈向用于化学图像识别的深度学习
J Cheminform. 2020 Oct 27;12(1):65. doi: 10.1186/s13321-020-00469-w.
3
ChemSchematicResolver: A Toolkit to Decode 2D Chemical Diagrams with Labels and R-Groups into Annotated Chemical Named Entities.ChemSchematicResolver:一种将带标签和 R 基团的 2D 化学图表解码为带注释的化学命名实体的工具包。
在一项催化研究中,由大语言模型增强的自动化和机器学习。
Chem Sci. 2024 Jun 26;15(31):12200-12233. doi: 10.1039/d3sc07012c. eCollection 2024 Aug 7.
4
PatCID: an open-access dataset of chemical structures in patent documents.PatCID:专利文件中化学结构的开放获取数据集。
Nat Commun. 2024 Aug 2;15(1):6532. doi: 10.1038/s41467-024-50779-y.
5
Automated molecular structure segmentation from documents using ChemSAM.使用ChemSAM从文档中自动进行分子结构分割。
J Cheminform. 2024 Mar 12;16(1):29. doi: 10.1186/s13321-024-00823-2.
6
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications.YoDe分割:从科学出版物中自动无噪声检索分子结构。
J Cheminform. 2023 Nov 20;15(1):111. doi: 10.1186/s13321-023-00783-z.
7
Cheminformatics Microservice: unifying access to open cheminformatics toolkits.化学信息学微服务:统一对开放化学信息学工具包的访问。
J Cheminform. 2023 Oct 16;15(1):98. doi: 10.1186/s13321-023-00762-4.
8
Artificial intelligence for natural product drug discovery.人工智能在天然产物药物发现中的应用。
Nat Rev Drug Discov. 2023 Nov;22(11):895-916. doi: 10.1038/s41573-023-00774-7. Epub 2023 Sep 11.
9
DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.DECIMER.ai:一个用于科学出版物中光学化学结构自动识别、分割和识别的开放平台。
Nat Commun. 2023 Aug 19;14(1):5045. doi: 10.1038/s41467-023-40782-0.
10
Review of techniques and models used in optical chemical structure recognition in images and scanned documents.图像和扫描文档中光学化学结构识别所使用的技术与模型综述。
J Cheminform. 2022 Sep 9;14(1):61. doi: 10.1186/s13321-022-00642-3.
J Chem Inf Model. 2020 Apr 27;60(4):2059-2072. doi: 10.1021/acs.jcim.0c00042. Epub 2020 Apr 7.
4
Molecular Structure Extraction from Documents Using Deep Learning.使用深度学习从文档中提取分子结构。
J Chem Inf Model. 2019 Mar 25;59(3):1017-1029. doi: 10.1021/acs.jcim.8b00669. Epub 2019 Feb 27.
5
Information Retrieval and Text Mining Technologies for Chemistry.化学信息检索与文本挖掘技术。
Chem Rev. 2017 Jun 28;117(12):7673-7761. doi: 10.1021/acs.chemrev.6b00851. Epub 2017 May 5.
6
ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.ChemDataExtractor:一个用于从科学文献中自动提取化学信息的工具包。
J Chem Inf Model. 2016 Oct 24;56(10):1894-1904. doi: 10.1021/acs.jcim.6b00207. Epub 2016 Oct 6.
7
Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on.化学领域的开放数据、开放源码和开放标准:蓝方尖碑五年回顾。
J Cheminform. 2011 Oct 14;3(1):37. doi: 10.1186/1758-2946-3-37.
8
Optical structure recognition software to recover chemical information: OSRA, an open source solution.用于恢复化学信息的光学结构识别软件:OSRA,一种开源解决方案。
J Chem Inf Model. 2009 Mar;49(3):740-3. doi: 10.1021/ci800067r.