BigSMARTS：一种用于聚合物化学结构的拓扑感知查询语言和子结构搜索算法

BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures.

作者信息

Rebello Nathan J, Lin Tzyy-Shyang, Nazeer Heeba, Olsen Bradley D

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.

Department of Computer Science, Wellesley College, 106 Central Street, Wellesley, Massachusetts 02481, United States.

出版信息

J Chem Inf Model. 2023 Nov 13;63(21):6555-6568. doi: 10.1021/acs.jcim.3c00978. Epub 2023 Oct 24.

DOI:10.1021/acs.jcim.3c00978

PMID:37874026

Abstract

Molecular search is important in chemistry, biology, and informatics for identifying molecular structures within large data sets, improving knowledge discovery and innovation, and making chemical data FAIR (findable, accessible, interoperable, reusable). Search algorithms for polymers are significantly less developed than those for small molecules because polymer search relies on searching by polymer name, which can be challenging because polymer naming is overly broad (i.e., polyethylene), complicated for complex chemical structures, and often does not correspond to official IUPAC conventions. Chemical structure search in polymers is limited to substructures, such as monomers, without awareness of connectivity or topology. This work introduces a novel query language and graph traversal search algorithm for polymers that provides the first search method able to fully capture all of the chemical structures present in polymers. The BigSMARTS query language, an extension of the small-molecule SMARTS language, allows users to write queries that localize monomer and functional group searches to different parts of the polymer, like the middle block of a triblock, the side chain of a graft, and the backbone of a repeat unit. The substructure search algorithm is based on the traversal of graph representations of the generating functions for the stochastic graphs of polymers. Operationally, the algorithm first identifies cycles representing the monomers and then the end groups and finally performs a depth-first search to match entire subgraphs. To validate the algorithm, hundreds of queries were searched against hundreds of target chemistries and topologies from the literature, with approximately 440,000 query-target pairs. This tool provides a detailed algorithm that can be implemented in search engines to provide search results with full matching of the monomer connectivity and polymer topology.

摘要

在化学、生物学和信息学领域，分子搜索对于在大型数据集中识别分子结构、促进知识发现与创新以及使化学数据符合FAIR原则（可查找、可访问、可互操作、可重用）至关重要。聚合物的搜索算法远不如小分子的搜索算法发达，因为聚合物搜索依赖于按聚合物名称进行搜索，这具有挑战性，原因在于聚合物命名过于宽泛（例如聚乙烯），对于复杂化学结构而言很复杂，并且常常不符合国际纯粹与应用化学联合会（IUPAC）的官方惯例。聚合物中的化学结构搜索仅限于子结构，例如单体，而不考虑连接性或拓扑结构。这项工作引入了一种用于聚合物的新型查询语言和图遍历搜索算法，该算法提供了第一种能够完全捕捉聚合物中所有化学结构的搜索方法。BigSMARTS查询语言是小分子SMARTS语言的扩展，它允许用户编写查询，将单体和官能团搜索定位到聚合物的不同部分，如三嵌段聚合物的中间嵌段、接枝聚合物的侧链以及重复单元的主链。子结构搜索算法基于对聚合物随机图的生成函数的图表示进行遍历。在操作上，该算法首先识别代表单体的环，然后识别端基，最后进行深度优先搜索以匹配整个子图。为了验证该算法，针对文献中的数百种目标化学结构和拓扑结构进行了数百次查询，大约有440,000个查询 - 目标对。这个工具提供了一种详细的算法，可在搜索引擎中实现，以提供与单体连接性和聚合物拓扑结构完全匹配的搜索结果。

相似文献

BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures.BigSMARTS：一种用于聚合物化学结构的拓扑感知查询语言和子结构搜索算法

J Chem Inf Model. 2023 Nov 13;63(21):6555-6568. doi: 10.1021/acs.jcim.3c00978. Epub 2023 Oct 24.

Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.使用图查询搜索 COVID-19 临床研究：算法开发与验证。

J Med Internet Res. 2024 May 30;26:e52655. doi: 10.2196/52655.

Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge.高效的大型化学文库亚结构搜索：ABCD 化学盒。

J Chem Inf Model. 2011 Dec 27;51(12):3113-30. doi: 10.1021/ci200413e. Epub 2011 Nov 14.

RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries.RDCanon：一个用于标准化 SMARTS 查询中令牌顺序的 Python 包。

J Chem Inf Model. 2024 Apr 22;64(8):2948-2954. doi: 10.1021/acs.jcim.4c00138. Epub 2024 Mar 15.

Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature.机器学习模型在高分子信息学中的基准测试：玻璃化转变温度的实例。

J Chem Inf Model. 2021 Nov 22;61(11):5395-5413. doi: 10.1021/acs.jcim.1c01031. Epub 2021 Oct 18.

Smiles2Monomers: a link between chemical and biological structures for polymers.从微笑到单体：聚合物化学结构与生物结构之间的联系

J Cheminform. 2015 Dec 29;7:62. doi: 10.1186/s13321-015-0111-5. eCollection 2015.

AMBIT-SMARTS: Efficient Searching of Chemical Structures and Fragments.AMBIT-SMARTS：高效的化学结构和片段搜索。

Mol Inform. 2011 Aug;30(8):707-20. doi: 10.1002/minf.201100028. Epub 2011 Aug 4.

Scaffold hopping using clique detection applied to reduced graphs.使用团检测应用于简化图的支架跳跃。

J Chem Inf Model. 2006 Mar-Apr;46(2):503-11. doi: 10.1021/ci050347r.

Molecular query language (MQL)--a context-free grammar for substructure matching.分子查询语言（MQL）——一种用于子结构匹配的上下文无关语法。

J Chem Inf Model. 2007 Mar-Apr;47(2):295-301. doi: 10.1021/ci600305h.

LEAP into the Pfizer Global Virtual Library (PGVL) space: creation of readily synthesizable design ideas automatically.跃入辉瑞全球虚拟图书馆（PGVL）空间：自动生成易于合成的设计理念。

Methods Mol Biol. 2011;685:253-76. doi: 10.1007/978-1-60761-931-4_13.

BigSMARTS：一种用于聚合物化学结构的拓扑感知查询语言和子结构搜索算法

BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures.

作者信息

Rebello Nathan J, Lin Tzyy-Shyang, Nazeer Heeba, Olsen Bradley D

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.

Department of Computer Science, Wellesley College, 106 Central Street, Wellesley, Massachusetts 02481, United States.

出版信息

J Chem Inf Model. 2023 Nov 13;63(21):6555-6568. doi: 10.1021/acs.jcim.3c00978. Epub 2023 Oct 24.

DOI:10.1021/acs.jcim.3c00978

PMID:37874026

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

BigSMARTS：一种用于聚合物化学结构的拓扑感知查询语言和子结构搜索算法

BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures.

作者信息

机构信息

出版信息

相似文献

BigSMARTS：一种用于聚合物化学结构的拓扑感知查询语言和子结构搜索算法

BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures.

作者信息

机构信息

出版信息

相似文献