• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从数字光栅图像中自动提取化学结构信息。

Automated extraction of chemical structure information from digital raster images.

作者信息

Park Jungkap, Rosania Gus R, Shedden Kerby A, Nguyen Mandee, Lyu Naesung, Saitou Kazuhiro

机构信息

Michigan Alliance for Cheminformatic Exploration, Ann Arbor, MI, USA.

出版信息

Chem Cent J. 2009 Feb 5;3:4. doi: 10.1186/1752-153X-3-4.

DOI:10.1186/1752-153X-3-4
PMID:19196483
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2648963/
Abstract

BACKGROUND

To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated.

RESULTS

This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader - a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns.

CONCLUSION

The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.

摘要

背景

为了在研究文章中搜索化学结构,需要将表示分子的图表或文本转换为与化学信息检索引擎兼容的标准化学文件格式。然而,研究文章中包含的化学信息通常以嵌入数字光栅图像中的化学结构模拟图的形式引用。为了实现科研文章中化学结构图的模拟到数字的自动转换,已经开发了几个软件系统。但它们在化学信息学研究中的算法性能和实用性尚未得到研究。

结果

本文旨在对这些系统进行批判性评价,并报告我们最近开发的ChemReader——一种用于提取研究文章中的化学结构图并将其转换为标准的、可搜索的化学文件格式的全自动工具。识别化学结构图中表示键和原子的线条和字母的基本算法可以从图形用户界面按顺序独立运行,并且算法参数可以很容易地更改,以促进专门针对化学数据库注释方案的进一步开发。与现有软件程序如OSRA、Kekule和CLiDE相比,我们的结果表明,在来自不同来源的几组样本图像上,ChemReader在正确输出率和提取分子子结构模式的准确性方面优于其他软件系统。

结论

ChemReader作为一种从数字光栅图像中提取化学结构信息的化学信息学工具,其可用性使研究和开发团队能够通过用已发表的研究文章注释条目来丰富其化学结构数据库。基于其稳定的性能和高精度,ChemReader对于用与科研文章的链接注释化学数据库可能足够准确。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/836fd5c02fe6/1752-153X-3-4-14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/1911900512b1/1752-153X-3-4-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/8744493b699c/1752-153X-3-4-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/56814513019f/1752-153X-3-4-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/f7669d18a491/1752-153X-3-4-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/64871049bb44/1752-153X-3-4-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/dc2c343a07d5/1752-153X-3-4-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/ec3768083d83/1752-153X-3-4-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/d0a1f66b10fb/1752-153X-3-4-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/cf8673f593a0/1752-153X-3-4-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/8a2fc1be418b/1752-153X-3-4-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/2f752ebbff4e/1752-153X-3-4-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/726f460f787d/1752-153X-3-4-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/c4ad577c26dc/1752-153X-3-4-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/836fd5c02fe6/1752-153X-3-4-14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/1911900512b1/1752-153X-3-4-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/8744493b699c/1752-153X-3-4-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/56814513019f/1752-153X-3-4-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/f7669d18a491/1752-153X-3-4-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/64871049bb44/1752-153X-3-4-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/dc2c343a07d5/1752-153X-3-4-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/ec3768083d83/1752-153X-3-4-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/d0a1f66b10fb/1752-153X-3-4-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/cf8673f593a0/1752-153X-3-4-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/8a2fc1be418b/1752-153X-3-4-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/2f752ebbff4e/1752-153X-3-4-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/726f460f787d/1752-153X-3-4-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/c4ad577c26dc/1752-153X-3-4-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ed6/2648963/836fd5c02fe6/1752-153X-3-4-14.jpg

相似文献

1
Automated extraction of chemical structure information from digital raster images.从数字光栅图像中自动提取化学结构信息。
Chem Cent J. 2009 Feb 5;3:4. doi: 10.1186/1752-153X-3-4.
2
Tunable machine vision-based strategy for automated annotation of chemical databases.基于可调谐机器视觉的化学数据库自动标注策略。
J Chem Inf Model. 2009 Aug;49(8):1993-2001. doi: 10.1021/ci900029v.
3
[Construction of chemical information database based on optical structure recognition technique].基于光学结构识别技术的化学信息数据库构建
Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):352-357.
4
CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition.CLiDE Pro:CLiDE的最新一代产品,一款用于光学化学结构识别的工具。
J Chem Inf Model. 2009 Apr;49(4):780-7. doi: 10.1021/ci800449t.
5
Optical structure recognition software to recover chemical information: OSRA, an open source solution.用于恢复化学信息的光学结构识别软件:OSRA,一种开源解决方案。
J Chem Inf Model. 2009 Mar;49(3):740-3. doi: 10.1021/ci800067r.
6
Automated teaching file and slide database for digital images.用于数字图像的自动化教学文件和幻灯片数据库。
AJR Am J Roentgenol. 2000 Nov;175(5):1249-51. doi: 10.2214/ajr.175.5.1751249.
7
Chemical machine vision: automated extraction of chemical metadata from raster images.
J Chem Inf Comput Sci. 2003 Sep-Oct;43(5):1342-55. doi: 10.1021/ci034017n.
8
ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files.化学引擎:从PDF文件中提取补充数据的三维化学结构
J Cheminform. 2016 Dec 29;8:73. doi: 10.1186/s13321-016-0175-x. eCollection 2016.
9
ReactionDataExtractor: A Tool for Automated Extraction of Information from Chemical Reaction Schemes.反应数据提取器:一种从化学反应图中自动提取信息的工具。
J Chem Inf Model. 2021 Oct 25;61(10):4962-4974. doi: 10.1021/acs.jcim.1c01017. Epub 2021 Sep 15.
10
Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.超越黑木树:影响澳大利亚地区、农村和偏远地区的健康研究问题的快速综述。
Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881.

引用本文的文献

1
MolNexTR: a generalized deep learning model for molecular image recognition.MolNexTR:一种用于分子图像识别的通用深度学习模型。
J Cheminform. 2024 Dec 18;16(1):141. doi: 10.1186/s13321-024-00926-w.
2
Automation and machine learning augmented by large language models in a catalysis study.在一项催化研究中,由大语言模型增强的自动化和机器学习。
Chem Sci. 2024 Jun 26;15(31):12200-12233. doi: 10.1039/d3sc07012c. eCollection 2024 Aug 7.
3
Automated molecular structure segmentation from documents using ChemSAM.使用ChemSAM从文档中自动进行分子结构分割。

本文引用的文献

1
Reconstruction of chemical molecules from images.从图像中重建化学分子。
Annu Int Conf IEEE Eng Med Biol Soc. 2007;2007:4609-12. doi: 10.1109/IEMBS.2007.4353366.
2
A cheminformatic toolkit for mining biomedical knowledge.一种用于挖掘生物医学知识的化学信息学工具包。
Pharm Res. 2007 Oct;24(10):1791-802. doi: 10.1007/s11095-007-9285-5. Epub 2007 Mar 24.
3
Chemical machine vision: automated extraction of chemical metadata from raster images.
J Chem Inf Comput Sci. 2003 Sep-Oct;43(5):1342-55. doi: 10.1021/ci034017n.
J Cheminform. 2024 Mar 12;16(1):29. doi: 10.1186/s13321-024-00823-2.
4
ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes.反应数据提取器 2.0:一种从化学反应图中提取数据的深度学习方法。
J Chem Inf Model. 2023 Oct 9;63(19):6053-6067. doi: 10.1021/acs.jcim.3c00422. Epub 2023 Sep 20.
5
Review of techniques and models used in optical chemical structure recognition in images and scanned documents.图像和扫描文档中光学化学结构识别所使用的技术与模型综述。
J Cheminform. 2022 Sep 9;14(1):61. doi: 10.1186/s13321-022-00642-3.
6
SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer.SwinOCSR:使用Swin Transformer进行端到端光学化学结构识别
J Cheminform. 2022 Jul 1;14(1):41. doi: 10.1186/s13321-022-00624-5.
7
DECIMER-hand-drawn molecule images dataset.DECIMER 手绘分子图像数据集。
J Cheminform. 2022 Jun 9;14(1):36. doi: 10.1186/s13321-022-00620-9.
8
Img2Mol - accurate SMILES recognition from molecular graphical depictions.Img2Mol - 从分子图形描绘中准确识别SMILES
Chem Sci. 2021 Sep 29;12(42):14174-14181. doi: 10.1039/d1sc01839f. eCollection 2021 Nov 3.
9
ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning.ChemPix:利用深度学习对手绘烃类结构进行自动识别
Chem Sci. 2021 Jul 3;12(31):10622-10633. doi: 10.1039/d1sc02957f. eCollection 2021 Aug 11.
10
A review of optical chemical structure recognition tools.光学化学结构识别工具综述。
J Cheminform. 2020 Oct 7;12(1):60. doi: 10.1186/s13321-020-00465-0.