Suppr超能文献

ChemSchematicResolver:一种将带标签和 R 基团的 2D 化学图表解码为带注释的化学命名实体的工具包。

ChemSchematicResolver: A Toolkit to Decode 2D Chemical Diagrams with Labels and R-Groups into Annotated Chemical Named Entities.

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.

ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.

出版信息

J Chem Inf Model. 2020 Apr 27;60(4):2059-2072. doi: 10.1021/acs.jcim.0c00042. Epub 2020 Apr 7.

Abstract

The number of journal articles in the scientific domain has grown to the point where it has become impossible for researchers to capitalize on all findings in their relevant discipline. Information is stored in these articles in a number of ways, including figures that describe important results. In organic chemistry, these figures often present chemical schematic diagrams that graphically define the structures of carbon-based compounds. These diagrams are intuitive for an expert to comprehend, but they are not designed for machines. This work presents ChemSchematicResolver, a software tool that can be used to identify chemical schematic diagrams within the figure of a document, resolve any R-group substituents within them, and convert the resulting diagrams to a machine-readable format in a high-throughput, autonomous fashion. The tool includes a new algorithm that is used to identify relevant diagrams and a mechanism that combines these data with contextual information from the rest of the document for the creation of highly relational databases. It includes support for a variety of general R-group structures, the first time this is available in any open-source chemical schematic diagram extraction tool. It is presented alongside a self-generated evaluation set, on which the most important assessment metric, precision, achieved 83-100% for all assessed areas. The ChemSchematicResolver tool is released under the MIT license and is available to download from www.chemschematicresolver.org.

摘要

科学领域的期刊文章数量已经增长到了一个研究者不可能充分利用其相关学科的所有发现的程度。信息以多种方式存储在这些文章中,包括描述重要结果的图表。在有机化学中,这些图表通常呈现化学示意性图表,以图形方式定义基于碳的化合物的结构。这些图表对于专家来说是直观的,但它们不是为机器设计的。这项工作提出了 ChemSchematicResolver,这是一个软件工具,可以用于识别文档中图形内的化学示意性图表,解析其中的任何 R 基团取代基,并以高通量、自主的方式将生成的图表转换为机器可读的格式。该工具包括一种用于识别相关图表的新算法,以及一种将这些数据与文档其余部分的上下文信息结合起来用于创建高度相关数据库的机制。它支持各种通用的 R 基团结构,这是任何开源化学示意性图表提取工具中首次提供。它与一个自生成的评估集一起呈现,在这个评估集上,最重要的评估指标——精度,在所有评估区域的准确率达到了 83-100%。ChemSchematicResolver 工具根据麻省理工学院的许可证发布,并可从 www.chemschematicresolver.org 下载。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验