相似文献

1

Optical structure recognition software to recover chemical information: OSRA, an open source solution.

J Chem Inf Model. 2009 Mar;49(3):740-3. doi: 10.1021/ci800067r.

2

TIFF, GIF, and PNG: get the picture?

Biomed Instrum Technol. 2007 Jul-Aug;41(4):297-300. doi: 10.2345/0899-8205(2007)41[297:TGAPGT]2.0.CO;2.

3

[Construction of chemical information database based on optical structure recognition technique].

Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):352-357.

4

Automated extraction of chemical structure information from digital raster images.

Chem Cent J. 2009 Feb 5;3:4. doi: 10.1186/1752-153X-3-4.

5

DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature.

J Cheminform. 2021 Mar 8;13(1):20. doi: 10.1186/s13321-021-00496-1.

6

Proper use of common image file formats in handling radiological images.

Radiol Med. 2009 Apr;114(3):484-95. doi: 10.1007/s11547-009-0378-6. Epub 2009 Mar 27.

7

Nutil: A Pre- and Post-processing Toolbox for Histological Rodent Brain Section Images.

Front Neuroinform. 2020 Aug 21;14:37. doi: 10.3389/fninf.2020.00037. eCollection 2020.

8

Markov logic networks for optical chemical structure recognition.

J Chem Inf Model. 2014 Aug 25;54(8):2380-90. doi: 10.1021/ci5002197. Epub 2014 Aug 6.

9

Open source tools for management and archiving of digital microscopy data to allow integration with patient pathology and treatment information.

Diagn Pathol. 2013 Feb 12;8:22. doi: 10.1186/1746-1596-8-22.

10

ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning.

J Chem Inf Model. 2020 Oct 26;60(10):4506-4517. doi: 10.1021/acs.jcim.0c00459. Epub 2020 Sep 24.

引用本文的文献

1

MolNexTR: a generalized deep learning model for molecular image recognition.

J Cheminform. 2024 Dec 18;16(1):141. doi: 10.1186/s13321-024-00926-w.

2

Automation and machine learning augmented by large language models in a catalysis study.

Chem Sci. 2024 Jun 26;15(31):12200-12233. doi: 10.1039/d3sc07012c. eCollection 2024 Aug 7.

3

PatCID: an open-access dataset of chemical structures in patent documents.

Nat Commun. 2024 Aug 2;15(1):6532. doi: 10.1038/s41467-024-50779-y.

4

Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture.

J Cheminform. 2024 Jul 5;16(1):78. doi: 10.1186/s13321-024-00872-7.

5

MMSSC-Net: multi-stage sequence cognitive networks for drug molecule recognition.

RSC Adv. 2024 Jun 6;14(26):18182-18191. doi: 10.1039/d4ra02442g.

6

Automated molecular structure segmentation from documents using ChemSAM.

J Cheminform. 2024 Mar 12;16(1):29. doi: 10.1186/s13321-024-00823-2.

7

Design, Synthesis, and Structure-Activity Relationship Studies of Novel GPR88 Agonists (4-Substituted-phenyl)acetamides Based on the Reversed Amide Scaffold.

ACS Chem Neurosci. 2024 Jan 3;15(1):169-192. doi: 10.1021/acschemneuro.3c00684. Epub 2023 Dec 12.

8

iNClusive: a database collecting useful information on non-canonical amino acids and their incorporation into proteins for easier genetic code expansion implementation.

Nucleic Acids Res. 2024 Jan 5;52(D1):D476-D482. doi: 10.1093/nar/gkad1090.

9

YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications.

J Cheminform. 2023 Nov 20;15(1):111. doi: 10.1186/s13321-023-00783-z.

10

TCMBank: bridges between the largest herbal medicines, chemical ingredients, target proteins, and associated diseases with intelligence text mining.

Chem Sci. 2023 Aug 8;14(39):10684-10701. doi: 10.1039/d3sc02139d. eCollection 2023 Oct 11.

本文引用的文献

1

Internet resources integrating many small-molecule databases.

SAR QSAR Environ Res. 2008 Jan-Mar;19(1-2):1-9. doi: 10.1080/10629360701843540.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。

用于恢复化学信息的光学结构识别软件：OSRA，一种开源解决方案。

Optical structure recognition software to recover chemical information: OSRA, an open source solution.

作者信息

Filippov Igor V, Nicklaus Marc C

机构信息

Laboratory of Medicinal Chemistry, SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21702, USA.

出版信息

J Chem Inf Model. 2009 Mar;49(3):740-3. doi: 10.1021/ci800067r.

DOI:10.1021/ci800067r

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2889020/

Abstract

Until recently most scientific and patent documents dealing with chemistry have described molecular structures either with systematic names or with graphical images of Kekulé structures. The latter method poses inherent problems in the automated processing that is needed when the number of documents ranges in the hundreds of thousands or even millions since graphical representations cannot be directly interpreted by a computer. To recover this structural information, which is otherwise all but lost, we have built an optical structure recognition application based on modern advances in image processing implemented in open source tools, OSRA. OSRA can read documents in over 90 graphical formats including GIF, JPEG, PNG, TIFF, PDF, and PS, automatically recognizes and extracts the graphical information representing chemical structures in such documents, and generates the SMILES or SD representation of the encountered molecular structure images.

摘要

直到最近，大多数涉及化学的科学文献和专利文件都是用系统命名法或凯库勒结构的图形图像来描述分子结构的。后一种方法在文档数量达到数十万甚至数百万时所需的自动化处理中存在固有问题，因为图形表示不能被计算机直接解读。为了恢复这些否则就会几乎丢失的结构信息，我们基于开源工具中实现的图像处理方面的现代进展构建了一个光学结构识别应用程序，即OSRA。OSRA可以读取包括GIF、JPEG、PNG、TIFF、PDF和PS在内的90多种图形格式的文档，自动识别并提取此类文档中表示化学结构的图形信息，并生成所遇到的分子结构图像的SMILES或SD表示。