• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MolMiner:只看一次的化学结构识别。

MolMiner: You Only Look Once for Chemical Structure Recognition.

机构信息

Infinite Intelligence Pharma, Beijing, China 100083.

Center for Quantitative Biology, Peking University, Beijing, China 100871.

出版信息

J Chem Inf Model. 2022 Nov 28;62(22):5321-5328. doi: 10.1021/acs.jcim.2c00733. Epub 2022 Sep 15.

DOI:10.1021/acs.jcim.2c00733
PMID:36108142
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9710516/
Abstract

Molecular structures are commonly depicted in 2D printed forms in scientific documents such as journal papers and patents. However, these 2D depictions are not machine readable. Due to a backlog of decades and an increasing amount of printed literatures, there is a high demand for translating printed depictions into machine-readable formats, which is known as Optical Chemical Structure Recognition (OCSR). Most OCSR systems developed over the last three decades use a rule-based approach, which vectorizes the depiction based on the interpretation of vectors and nodes as bonds and atoms. Here, we present a practical software called MolMiner, which is primarily built using deep neural networks originally developed for semantic segmentation and object detection to recognize atom and bond elements from documents. These recognized elements can be easily connected as a molecular graph with a distance-based construction algorithm. MolMiner gave state-of-the-art performance on four benchmark data sets and a self-collected external data set from scientific papers. As MolMiner performed similarly well in real-world OCSR tasks with a user-friendly interface, it is a useful and valuable tool for daily applications. The free download links of Mac and Windows versions are available at https://github.com/iipharma/pharmamind-molminer.

摘要

分子结构通常在科学文献(如期刊论文和专利)中以二维打印形式呈现。然而,这些二维描述不具有机器可读性。由于几十年的积压和不断增加的印刷文献数量,将印刷描述转换为机器可读格式的需求很高,这被称为光学化学结构识别(OCSR)。过去三十年来开发的大多数 OCSR 系统都使用基于规则的方法,该方法根据对矢量和节点作为键和原子的解释对描述进行矢量化。在这里,我们展示了一个名为 MolMiner 的实用软件,它主要使用最初为语义分割和对象检测开发的深度神经网络来识别文档中的原子和键元素。这些识别出的元素可以通过基于距离的构建算法轻松连接成分子图。MolMiner 在四个基准数据集和一个来自科学论文的自收集外部数据集上取得了最先进的性能。由于 MolMiner 在具有用户友好界面的真实 OCSR 任务中表现同样出色,因此它是日常应用的有用且有价值的工具。Mac 和 Windows 版本的免费下载链接可在 https://github.com/iipharma/pharmamind-molminer 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e26f/9710516/ba82a7823897/ci2c00733_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e26f/9710516/e47401cf15a3/ci2c00733_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e26f/9710516/0e898e9df6e9/ci2c00733_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e26f/9710516/6dd6f7e628e9/ci2c00733_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e26f/9710516/ba82a7823897/ci2c00733_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e26f/9710516/e47401cf15a3/ci2c00733_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e26f/9710516/0e898e9df6e9/ci2c00733_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e26f/9710516/6dd6f7e628e9/ci2c00733_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e26f/9710516/ba82a7823897/ci2c00733_0004.jpg

相似文献

1
MolMiner: You Only Look Once for Chemical Structure Recognition.MolMiner:只看一次的化学结构识别。
J Chem Inf Model. 2022 Nov 28;62(22):5321-5328. doi: 10.1021/acs.jcim.2c00733. Epub 2022 Sep 15.
2
A review of optical chemical structure recognition tools.光学化学结构识别工具综述。
J Cheminform. 2020 Oct 7;12(1):60. doi: 10.1186/s13321-020-00465-0.
3
DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature.DECIMER-分割:从科学文献中自动提取化学结构描绘。
J Cheminform. 2021 Mar 8;13(1):20. doi: 10.1186/s13321-021-00496-1.
4
DECIMER 1.0: deep learning for chemical image recognition using transformers.DECIMER 1.0:使用Transformer进行化学图像识别的深度学习
J Cheminform. 2021 Aug 17;13(1):61. doi: 10.1186/s13321-021-00538-8.
5
ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images.ABC-Net:一种基于分而治之的深度学习架构,用于从分子图像中识别 SMILES。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac033.
6
DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.DECIMER.ai:一个用于科学出版物中光学化学结构自动识别、分割和识别的开放平台。
Nat Commun. 2023 Aug 19;14(1):5045. doi: 10.1038/s41467-023-40782-0.
7
DECIMER-hand-drawn molecule images dataset.DECIMER 手绘分子图像数据集。
J Cheminform. 2022 Jun 9;14(1):36. doi: 10.1186/s13321-022-00620-9.
8
RanDepict: Random chemical structure depiction generator.RanDepict:随机化学结构描绘生成器。
J Cheminform. 2022 Jun 6;14(1):31. doi: 10.1186/s13321-022-00609-4.
9
Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases.无监督构建具有显式结构归纳偏差的基因表达数据的计算图。
Bioinformatics. 2022 Feb 7;38(5):1320-1327. doi: 10.1093/bioinformatics/btab830.
10
ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning.ChemGrapher:基于深度学习的化学化合物光学图形识别。
J Chem Inf Model. 2020 Oct 26;60(10):4506-4517. doi: 10.1021/acs.jcim.0c00459. Epub 2020 Sep 24.

引用本文的文献

1
MolNexTR: a generalized deep learning model for molecular image recognition.MolNexTR:一种用于分子图像识别的通用深度学习模型。
J Cheminform. 2024 Dec 18;16(1):141. doi: 10.1186/s13321-024-00926-w.
2
Automation and machine learning augmented by large language models in a catalysis study.在一项催化研究中,由大语言模型增强的自动化和机器学习。
Chem Sci. 2024 Jun 26;15(31):12200-12233. doi: 10.1039/d3sc07012c. eCollection 2024 Aug 7.
3
PatCID: an open-access dataset of chemical structures in patent documents.PatCID:专利文件中化学结构的开放获取数据集。

本文引用的文献

1
ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images.ABC-Net:一种基于分而治之的深度学习架构,用于从分子图像中识别 SMILES。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac033.
2
Img2Mol - accurate SMILES recognition from molecular graphical depictions.Img2Mol - 从分子图形描绘中准确识别SMILES
Chem Sci. 2021 Sep 29;12(42):14174-14181. doi: 10.1039/d1sc01839f. eCollection 2021 Nov 3.
3
The Open Reaction Database.开放式反应数据库。
Nat Commun. 2024 Aug 2;15(1):6532. doi: 10.1038/s41467-024-50779-y.
4
MMSSC-Net: multi-stage sequence cognitive networks for drug molecule recognition.MMSSC-Net:用于药物分子识别的多阶段序列认知网络
RSC Adv. 2024 Jun 6;14(26):18182-18191. doi: 10.1039/d4ra02442g.
5
Automated molecular structure segmentation from documents using ChemSAM.使用ChemSAM从文档中自动进行分子结构分割。
J Cheminform. 2024 Mar 12;16(1):29. doi: 10.1186/s13321-024-00823-2.
6
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications.YoDe分割:从科学出版物中自动无噪声检索分子结构。
J Cheminform. 2023 Nov 20;15(1):111. doi: 10.1186/s13321-023-00783-z.
7
DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.DECIMER.ai:一个用于科学出版物中光学化学结构自动识别、分割和识别的开放平台。
Nat Commun. 2023 Aug 19;14(1):5045. doi: 10.1038/s41467-023-40782-0.
J Am Chem Soc. 2021 Nov 17;143(45):18820-18826. doi: 10.1021/jacs.1c09820. Epub 2021 Nov 2.
4
DECIMER 1.0: deep learning for chemical image recognition using transformers.DECIMER 1.0:使用Transformer进行化学图像识别的深度学习
J Cheminform. 2021 Aug 17;13(1):61. doi: 10.1186/s13321-021-00538-8.
5
Traditional Uses and Pharmacologically Active Constituents of Dendrobium Plants for Dermatological Disorders: A Review.石斛属植物在皮肤病治疗中的传统用途及药理活性成分:综述
Nat Prod Bioprospect. 2021 Oct;11(5):465-487. doi: 10.1007/s13659-021-00305-0. Epub 2021 Apr 20.
6
A review of optical chemical structure recognition tools.光学化学结构识别工具综述。
J Cheminform. 2020 Oct 7;12(1):60. doi: 10.1186/s13321-020-00465-0.
7
RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences.RCSB 蛋白质数据库:用于基础生物学、生物医学、生物技术、生物工程和能源科学等领域的基础研究、应用研究和教育中探索生物大分子三维结构的强大新工具。
Nucleic Acids Res. 2021 Jan 8;49(D1):D437-D451. doi: 10.1093/nar/gkaa1038.
8
ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning.ChemGrapher:基于深度学习的化学化合物光学图形识别。
J Chem Inf Model. 2020 Oct 26;60(10):4506-4517. doi: 10.1021/acs.jcim.0c00459. Epub 2020 Sep 24.
9
Deep learning in drug discovery: opportunities, challenges and future prospects.深度学习在药物发现中的应用:机遇、挑战与未来展望。
Drug Discov Today. 2019 Oct;24(10):2017-2032. doi: 10.1016/j.drudis.2019.07.006. Epub 2019 Aug 1.
10
Recent Advances of Deep Learning in Bioinformatics and Computational Biology.深度学习在生物信息学和计算生物学中的最新进展
Front Genet. 2019 Mar 26;10:214. doi: 10.3389/fgene.2019.00214. eCollection 2019.