• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用深度学习从文档中提取分子结构。

Molecular Structure Extraction from Documents Using Deep Learning.

机构信息

Schrödinger, Inc. , 101 SW Main Street , Portland , Oregon 97204 , United States.

Schrödinger, Inc. , 120 West 45th Street , New York , New York 10036 , United States.

出版信息

J Chem Inf Model. 2019 Mar 25;59(3):1017-1029. doi: 10.1021/acs.jcim.8b00669. Epub 2019 Feb 27.

DOI:10.1021/acs.jcim.8b00669
PMID:30758950
Abstract

Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting the performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We present end-to-end deep learning solutions for both segmenting molecular structures from documents and predicting chemical structures from the segmented images. This deep-learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep learning approach described herein, we show that it is possible to perform well on both segmentation and prediction of low-resolution images containing moderately sized molecules found in journal articles and patents.

摘要

由于在分割过程中对结构的误识别和预测结构的错误,从文档中提取化学结构仍然是一个难题。目前的方法依赖于手工制作的规则和子程序,这些规则和子程序通常表现得相当好,但仍然经常遇到识别率不尽如人意的情况,系统改进具有挑战性。影响当前方法性能的并发症包括各种软件用于渲染结构的视觉样式的多样性、经常使用特别注释以及与图像质量相关的其他挑战,包括分辨率和噪声。我们提出了从文档中分割分子结构和从分割图像中预测化学结构的端到端深度学习解决方案。这种基于深度学习的方法不需要任何手工制作的特征,直接从数据中学习,并且对图像质量和样式的变化具有鲁棒性。使用本文描述的深度学习方法,我们表明,对于包含期刊文章和专利中中等大小分子的低分辨率图像的分割和预测,都可以表现得很好。

相似文献

1
Molecular Structure Extraction from Documents Using Deep Learning.使用深度学习从文档中提取分子结构。
J Chem Inf Model. 2019 Mar 25;59(3):1017-1029. doi: 10.1021/acs.jcim.8b00669. Epub 2019 Feb 27.
2
Automated molecular structure segmentation from documents using ChemSAM.使用ChemSAM从文档中自动进行分子结构分割。
J Cheminform. 2024 Mar 12;16(1):29. doi: 10.1186/s13321-024-00823-2.
3
Deep learning for patient-specific quality assurance: Identifying errors in radiotherapy delivery by radiomic analysis of gamma images with convolutional neural networks.深度学习在个体化质量保证中的应用:通过卷积神经网络对伽马图像的放射组学分析识别放射治疗中的误差。
Med Phys. 2019 Feb;46(2):456-464. doi: 10.1002/mp.13338. Epub 2018 Dec 28.
4
Automatic Segmentation of Multiple Organs on 3D CT Images by Using Deep Learning Approaches.基于深度学习方法的 3D CT 图像多器官自动分割。
Adv Exp Med Biol. 2020;1213:135-147. doi: 10.1007/978-3-030-33128-3_9.
5
AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy.AnatomyNet:用于快速和全自动对头颈部解剖结构进行整体体积分割的深度学习方法。
Med Phys. 2019 Feb;46(2):576-589. doi: 10.1002/mp.13300. Epub 2018 Dec 17.
6
DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature.DECIMER-分割:从科学文献中自动提取化学结构描绘。
J Cheminform. 2021 Mar 8;13(1):20. doi: 10.1186/s13321-021-00496-1.
7
Classification of Medical Images in the Biomedical Literature by Jointly Using Deep and Handcrafted Visual Features.医学文献中生物医学图像的分类,同时使用深度和手工制作的视觉特征。
IEEE J Biomed Health Inform. 2018 Sep;22(5):1521-1530. doi: 10.1109/JBHI.2017.2775662. Epub 2017 Nov 20.
8
Accurate and efficient linear structure segmentation by leveraging ad hoc features with learned filters.通过利用临时特征和学习滤波器进行准确高效的线性结构分割。
Med Image Comput Comput Assist Interv. 2012;15(Pt 1):189-97. doi: 10.1007/978-3-642-33415-3_24.
9
DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER:基于深度学习的标签-标签转换模型的生物医学命名实体识别。
BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.
10
ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning.ChemGrapher:基于深度学习的化学化合物光学图形识别。
J Chem Inf Model. 2020 Oct 26;60(10):4506-4517. doi: 10.1021/acs.jcim.0c00459. Epub 2020 Sep 24.

引用本文的文献

1
Capsule neural network and its applications in drug discovery.胶囊神经网络及其在药物发现中的应用。
iScience. 2025 Mar 14;28(4):112217. doi: 10.1016/j.isci.2025.112217. eCollection 2025 Apr 18.
2
Role of Artificial Intelligence in Drug Discovery to Revolutionize the Pharmaceutical Industry: Resources, Methods and Applications.人工智能在药物发现中对制药行业进行变革的作用:资源、方法与应用
Recent Pat Biotechnol. 2025;19(1):35-52. doi: 10.2174/0118722083297406240313090140.
3
MolNexTR: a generalized deep learning model for molecular image recognition.
MolNexTR:一种用于分子图像识别的通用深度学习模型。
J Cheminform. 2024 Dec 18;16(1):141. doi: 10.1186/s13321-024-00926-w.
4
Automation and machine learning augmented by large language models in a catalysis study.在一项催化研究中,由大语言模型增强的自动化和机器学习。
Chem Sci. 2024 Jun 26;15(31):12200-12233. doi: 10.1039/d3sc07012c. eCollection 2024 Aug 7.
5
ChemReco: automated recognition of hand-drawn carbon-hydrogen-oxygen structures using deep learning.ChemReco:利用深度学习对手绘碳氢氧结构进行自动识别
Sci Rep. 2024 Jul 25;14(1):17126. doi: 10.1038/s41598-024-67496-7.
6
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture.通过增强的DECIMER架构实现手绘化学结构识别的进展。
J Cheminform. 2024 Jul 5;16(1):78. doi: 10.1186/s13321-024-00872-7.
7
Automated molecular structure segmentation from documents using ChemSAM.使用ChemSAM从文档中自动进行分子结构分割。
J Cheminform. 2024 Mar 12;16(1):29. doi: 10.1186/s13321-024-00823-2.
8
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications.YoDe分割:从科学出版物中自动无噪声检索分子结构。
J Cheminform. 2023 Nov 20;15(1):111. doi: 10.1186/s13321-023-00783-z.
9
ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes.反应数据提取器 2.0:一种从化学反应图中提取数据的深度学习方法。
J Chem Inf Model. 2023 Oct 9;63(19):6053-6067. doi: 10.1021/acs.jcim.3c00422. Epub 2023 Sep 20.
10
DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.DECIMER.ai:一个用于科学出版物中光学化学结构自动识别、分割和识别的开放平台。
Nat Commun. 2023 Aug 19;14(1):5045. doi: 10.1038/s41467-023-40782-0.