• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DECIMER 1.0:使用Transformer进行化学图像识别的深度学习

DECIMER 1.0: deep learning for chemical image recognition using transformers.

作者信息

Rajan Kohulan, Zielesny Achim, Steinbeck Christoph

机构信息

Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Lessingstr. 8, 07743, Jena, Germany.

Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665, Recklinghausen, Germany.

出版信息

J Cheminform. 2021 Aug 17;13(1):61. doi: 10.1186/s13321-021-00538-8.

DOI:10.1186/s13321-021-00538-8
PMID:34404468
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8369700/
Abstract

The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database are desirable, but doing this manually is a slow and error-prone process. In order to extract chemical structure depictions and convert them into a computer-readable format, Optical Chemical Structure Recognition (OCSR) tools were developed where the best performing OCSR tools are mostly rule-based. The DECIMER (Deep lEarning for Chemical ImagE Recognition) project was launched to address the OCSR problem with the latest computational intelligence methods to provide an automated open-source software solution. Various current deep learning approaches were explored to seek a best-fitting solution to the problem. In a preliminary communication, we outlined the prospect of being able to predict SMILES encodings of chemical structure depictions with about 90% accuracy using a dataset of 50-100 million molecules. In this article, the new DECIMER model is presented, a transformer-based network, which can predict SMILES with above 96% accuracy from depictions of chemical structures without stereochemical information and above 89% accuracy for depictions with stereochemical information.

摘要

在过去几十年里,关于化学结构及其性质的可用数据量一直在稳步增加。特别是,20世纪90年代中期之前发表的文章只有印刷版或扫描版。从这些文章中提取数据并存储到一个可公开访问的数据库中是很有必要的,但手动操作这个过程既缓慢又容易出错。为了提取化学结构描述并将其转换为计算机可读格式,人们开发了光学化学结构识别(OCSR)工具,其中性能最佳的OCSR工具大多基于规则。DECIMER(用于化学图像识别的深度学习)项目启动,旨在利用最新的计算智能方法解决OCSR问题,提供一个自动化的开源软件解决方案。人们探索了各种当前的深度学习方法,以寻求该问题的最佳解决方案。在一篇初步通讯中,我们概述了使用一个包含5000万到1亿个分子的数据集,能够以约90%的准确率预测化学结构描述的SMILES编码的前景。在本文中,我们展示了新的DECIMER模型,这是一个基于Transformer的网络,它可以从没有立体化学信息的化学结构描述中以高于96%的准确率预测SMILES,对于有立体化学信息的描述,准确率高于89%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/cda973ff37bb/13321_2021_538_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/8c9c336d9461/13321_2021_538_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/d04e646fe3ca/13321_2021_538_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/f2398e7ce825/13321_2021_538_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/b476b19a0a3f/13321_2021_538_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/f8e9cc062b15/13321_2021_538_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/2472c246c8d3/13321_2021_538_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/31778ed716cd/13321_2021_538_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/6dbc05fd5fb5/13321_2021_538_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/00445cc9164a/13321_2021_538_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/cda973ff37bb/13321_2021_538_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/8c9c336d9461/13321_2021_538_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/d04e646fe3ca/13321_2021_538_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/f2398e7ce825/13321_2021_538_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/b476b19a0a3f/13321_2021_538_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/f8e9cc062b15/13321_2021_538_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/2472c246c8d3/13321_2021_538_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/31778ed716cd/13321_2021_538_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/6dbc05fd5fb5/13321_2021_538_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/00445cc9164a/13321_2021_538_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef27/8369700/cda973ff37bb/13321_2021_538_Fig10_HTML.jpg

相似文献

1
DECIMER 1.0: deep learning for chemical image recognition using transformers.DECIMER 1.0:使用Transformer进行化学图像识别的深度学习
J Cheminform. 2021 Aug 17;13(1):61. doi: 10.1186/s13321-021-00538-8.
2
DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature.DECIMER-分割:从科学文献中自动提取化学结构描绘。
J Cheminform. 2021 Mar 8;13(1):20. doi: 10.1186/s13321-021-00496-1.
3
DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.DECIMER.ai:一个用于科学出版物中光学化学结构自动识别、分割和识别的开放平台。
Nat Commun. 2023 Aug 19;14(1):5045. doi: 10.1038/s41467-023-40782-0.
4
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture.通过增强的DECIMER架构实现手绘化学结构识别的进展。
J Cheminform. 2024 Jul 5;16(1):78. doi: 10.1186/s13321-024-00872-7.
5
A review of optical chemical structure recognition tools.光学化学结构识别工具综述。
J Cheminform. 2020 Oct 7;12(1):60. doi: 10.1186/s13321-020-00465-0.
6
DECIMER: towards deep learning for chemical image recognition.DECIMER:迈向用于化学图像识别的深度学习
J Cheminform. 2020 Oct 27;12(1):65. doi: 10.1186/s13321-020-00469-w.
7
DECIMER-hand-drawn molecule images dataset.DECIMER 手绘分子图像数据集。
J Cheminform. 2022 Jun 9;14(1):36. doi: 10.1186/s13321-022-00620-9.
8
MolMiner: You Only Look Once for Chemical Structure Recognition.MolMiner:只看一次的化学结构识别。
J Chem Inf Model. 2022 Nov 28;62(22):5321-5328. doi: 10.1021/acs.jcim.2c00733. Epub 2022 Sep 15.
9
ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images.ABC-Net:一种基于分而治之的深度学习架构,用于从分子图像中识别 SMILES。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac033.
10
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications.YoDe分割:从科学出版物中自动无噪声检索分子结构。
J Cheminform. 2023 Nov 20;15(1):111. doi: 10.1186/s13321-023-00783-z.

引用本文的文献

1
A review of transformer models in drug discovery and beyond.药物发现及其他领域中变压器模型综述。
J Pharm Anal. 2025 Jun;15(6):101081. doi: 10.1016/j.jpha.2024.101081. Epub 2024 Aug 30.
2
A review of large language models and autonomous agents in chemistry.化学领域中大型语言模型与自主智能体的综述。
Chem Sci. 2024 Dec 9;16(6):2514-2572. doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5.
3
MolNexTR: a generalized deep learning model for molecular image recognition.MolNexTR:一种用于分子图像识别的通用深度学习模型。

本文引用的文献

1
Img2Mol - accurate SMILES recognition from molecular graphical depictions.Img2Mol - 从分子图形描绘中准确识别SMILES
Chem Sci. 2021 Sep 29;12(42):14174-14181. doi: 10.1039/d1sc01839f. eCollection 2021 Nov 3.
2
ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning.ChemPix:利用深度学习对手绘烃类结构进行自动识别
Chem Sci. 2021 Jul 3;12(31):10622-10633. doi: 10.1039/d1sc02957f. eCollection 2021 Aug 11.
3
Neuraldecipher - reverse-engineering extended-connectivity fingerprints (ECFPs) to their molecular structures.
J Cheminform. 2024 Dec 18;16(1):141. doi: 10.1186/s13321-024-00926-w.
4
pKalculator: A p predictor for C-H bonds.pKalculator:一种用于C-H键的p预测器。
Beilstein J Org Chem. 2024 Jul 16;20:1614-1622. doi: 10.3762/bjoc.20.144. eCollection 2024.
5
ChemReco: automated recognition of hand-drawn carbon-hydrogen-oxygen structures using deep learning.ChemReco:利用深度学习对手绘碳氢氧结构进行自动识别
Sci Rep. 2024 Jul 25;14(1):17126. doi: 10.1038/s41598-024-67496-7.
6
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture.通过增强的DECIMER架构实现手绘化学结构识别的进展。
J Cheminform. 2024 Jul 5;16(1):78. doi: 10.1186/s13321-024-00872-7.
7
Application of Transformers in Cheminformatics.Transformer 在化学信息学中的应用。
J Chem Inf Model. 2024 Jun 10;64(11):4392-4409. doi: 10.1021/acs.jcim.3c02070. Epub 2024 May 30.
8
HBCVTr: an end-to-end transformer with a deep neural network hybrid model for anti-HBV and HCV activity predictor from SMILES.HBCVTr:一种用于从SMILES预测抗HBV和HCV活性的具有深度神经网络混合模型的端到端变压器。
Sci Rep. 2024 Apr 22;14(1):9262. doi: 10.1038/s41598-024-59933-4.
9
PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank.PDBe CCDUtils:一个基于RDKit的工具包,用于处理和分析蛋白质数据库中的小分子。
J Cheminform. 2023 Dec 2;15(1):117. doi: 10.1186/s13321-023-00786-w.
10
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications.YoDe分割:从科学出版物中自动无噪声检索分子结构。
J Cheminform. 2023 Nov 20;15(1):111. doi: 10.1186/s13321-023-00783-z.
神经解密——将扩展连接指纹(ECFPs)逆向工程为其分子结构。
Chem Sci. 2020 Sep 11;11(38):10378-10389. doi: 10.1039/d0sc03115a.
4
DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature.DECIMER-分割:从科学文献中自动提取化学结构描绘。
J Cheminform. 2021 Mar 8;13(1):20. doi: 10.1186/s13321-021-00496-1.
5
A review of optical chemical structure recognition tools.光学化学结构识别工具综述。
J Cheminform. 2020 Oct 7;12(1):60. doi: 10.1186/s13321-020-00465-0.
6
DECIMER: towards deep learning for chemical image recognition.DECIMER:迈向用于化学图像识别的深度学习
J Cheminform. 2020 Oct 27;12(1):65. doi: 10.1186/s13321-020-00469-w.
7
ZINC20-A Free Ultralarge-Scale Chemical Database for Ligand Discovery.ZINC20-A 免费超大尺度化学数据库,用于配体发现。
J Chem Inf Model. 2020 Dec 28;60(12):6065-6073. doi: 10.1021/acs.jcim.0c00675. Epub 2020 Oct 29.
8
ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning.ChemGrapher:基于深度学习的化学化合物光学图形识别。
J Chem Inf Model. 2020 Oct 26;60(10):4506-4517. doi: 10.1021/acs.jcim.0c00459. Epub 2020 Sep 24.
9
ChemSchematicResolver: A Toolkit to Decode 2D Chemical Diagrams with Labels and R-Groups into Annotated Chemical Named Entities.ChemSchematicResolver:一种将带标签和 R 基团的 2D 化学图表解码为带注释的化学命名实体的工具包。
J Chem Inf Model. 2020 Apr 27;60(4):2059-2072. doi: 10.1021/acs.jcim.0c00042. Epub 2020 Apr 7.
10
Molecular Structure Extraction from Documents Using Deep Learning.使用深度学习从文档中提取分子结构。
J Chem Inf Model. 2019 Mar 25;59(3):1017-1029. doi: 10.1021/acs.jcim.8b00669. Epub 2019 Feb 27.