Suppr超能文献

CLCA:MetRxn 数据库中的最大公共分子子结构查询。

CLCA: maximum common molecular substructure queries within the MetRxn database.

机构信息

The Huck Institutes of the Life Sciences, Pennsylvania State University , University Park, Pennsylvania 16802, United States.

出版信息

J Chem Inf Model. 2014 Dec 22;54(12):3417-38. doi: 10.1021/ci5003922. Epub 2014 Dec 1.

Abstract

The challenge of automatically identifying the preserved molecular moieties in a chemical reaction is referred to as the atom mapping problem. Reaction atom maps provide the ability to locate the fate of individual atoms across an entire metabolic network. Atom maps are used to track atoms in isotope labeling experiments for metabolic flux elucidation, trace novel biosynthetic routes to a target compound, and contrast entire pathways for structural homology. However, rapid computation of the reaction atom mappings remains elusive despite significant research. We present a novel substructure search algorithm, canonical labeling for clique approximation (CLCA), with polynomial run-time complexity to quickly generate atom maps for all the reactions present in MetRxn. CLCA uses number theory (i.e., prime factorization) to generate canonical labels or unique IDs and identify a bijection between the vertices (atoms) of two distinct molecular graphs. CLCA utilizes molecular graphs generated by combining atomistic information on reactions and metabolites from 112 metabolic models and 8 metabolic databases. CLCA offers improvements in run time, accuracy, and memory utilization over existing heuristic and combinatorial maximum common substructure (MCS) search algorithms. We provide detailed examples on the various advantages as well as failure modes of CLCA over existing algorithms.

摘要

自动识别化学反应中保留的分子部分的挑战被称为原子映射问题。反应原子图提供了在整个代谢网络中定位单个原子命运的能力。原子图用于同位素标记实验中追踪原子,以阐明代谢通量,追踪目标化合物的新生物合成途径,并对比整个结构同源性的途径。然而,尽管进行了大量研究,但快速计算反应原子映射仍然难以实现。我们提出了一种新的子结构搜索算法,即用于团逼近的规范标记(CLCA),其具有多项式时间复杂度,可快速生成 MetRxn 中所有反应的原子图。CLCA 使用数论(即质因数分解)生成规范标签或唯一 ID,并识别两个不同分子图的顶点(原子)之间的双射。CLCA 利用通过将反应和代谢物的原子信息组合生成的分子图,这些信息来自 112 个代谢模型和 8 个代谢数据库。CLCA 在运行时间、准确性和内存利用率方面优于现有的启发式和组合最大公共子结构(MCS)搜索算法。我们提供了详细的示例,说明了 CLCA 相对于现有算法的各种优势和失败模式。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验