Suppr超能文献

化学名到结构:视蛋白,一个开源解决方案。

Chemical name to structure: OPSIN, an open source solution.

机构信息

Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, England.

出版信息

J Chem Inf Model. 2011 Mar 28;51(3):739-53. doi: 10.1021/ci100384d. Epub 2011 Mar 9.

Abstract

We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high precision (99.8%+) and, when using general organic chemical nomenclature, high recall (98.7-99.2%). This software can serve as the basis for future open source developments of chemical name interpretation.

摘要

我们开发了一个开源、免费、算法(用于系统 IUPAC 命名法的开放解析器,OPSIN),可以快速、准确地解释大多数有机化学命名法。这是通过基于正则语法的方法实现的。该语法用于指导标记化,这在化学名称中是一个潜在的难题。从解析后的化学名称中,构建一个 XML 解析树,然后逐步操作该解析树,直到从名称中重建结构。我们展示了 OPSIN 在各种计算机生成的名称/结构对集上的结果。这些结果显示出极高的精度(99.8%+),并且在使用通用有机化学命名法时,召回率也很高(98.7-99.2%)。该软件可以作为未来开源化学命名解释开发的基础。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验