Suppr超能文献

通过一种对相似功能分子片段进行分类的分层指纹方案,对光吸收有机分子数据库进行聚类。

Clustering a database of optically absorbing organic molecules via a hierarchical fingerprint scheme that categorizes similar functional molecular fragments.

作者信息

Flanagan Padraic J, Cole Jacqueline M

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom.

出版信息

J Chem Phys. 2022 Apr 21;156(15):154110. doi: 10.1063/5.0087603.

Abstract

A measure of chemical similarity is only useful if it implies similarity in some relevant property space. Typically, similarity calculations operate by assigning each molecule a chemical fingerprint: a fixed-length vector of bits where the on-bits signify the presence of a certain feature. Common fingerprinting schemes, such as extended-connectivity fingerprints, are by definition general and fail to capture much of the domain-specific theory that underpins similarity in a specific domain. In this work, a hierarchical fingerprinting scheme is developed that is bespoke to a database of ∼4500 organic molecules and their cognate optical absorption spectral properties. Our fingerprinting scheme incorporates molecular fragmentation and domain-specific chemical intuition into an algorithm that categorizes each fragment as being one of a core chemical group, a substituent, or a bridge. The algorithm is applied to every molecule in the database to generate a pool of chemically relevant fragments that are labeled according to their structural category. The fingerprint of each molecule is then composed of a nested Python dictionary specifying the unique identifiers of its constituent fragment entities and the structural links between them to give a hierarchical molecular encoding scheme. Four case studies show the application of our fingerprinting scheme to the subject database. In each case, the clustered molecules display a host of interesting chemical trends. The application that was used to develop and implement this bespoke fingerprinting scheme, referred to as ChemCluster, also exposes a host of other cheminformatics tools pertaining to this database, a selection of which is demonstrated in this work. The enhanced similarity comparisons afforded by our fingerprinting scheme, as well as the large repository of categorized fragments generated during its development, constitute the first step toward using this database in a data-driven materials discovery workflow.

摘要

只有当化学相似性度量意味着在某些相关属性空间中具有相似性时,它才有用。通常,相似性计算通过为每个分子分配一个化学指纹来进行:一个固定长度的位向量,其中的置位表示特定特征的存在。常见的指纹方案,如扩展连接性指纹,从定义上讲是通用的,无法捕捉到支撑特定领域相似性的许多领域特定理论。在这项工作中,开发了一种分层指纹方案,该方案是针对一个包含约4500个有机分子及其相关光学吸收光谱特性的数据库定制的。我们的指纹方案将分子片段化和领域特定的化学直觉纳入一种算法,该算法将每个片段分类为核心化学基团、取代基或桥接基团之一。该算法应用于数据库中的每个分子,以生成一组根据其结构类别进行标记的化学相关片段。然后,每个分子的指纹由一个嵌套的Python字典组成,该字典指定其组成片段实体的唯一标识符以及它们之间的结构链接,从而给出一种分层分子编码方案。四个案例研究展示了我们的指纹方案在主题数据库中的应用。在每种情况下,聚类的分子都显示出许多有趣的化学趋势。用于开发和实施这种定制指纹方案的应用程序,称为ChemCluster,还展示了与该数据库相关的许多其他化学信息学工具,本文展示了其中的一部分。我们的指纹方案提供的增强相似性比较,以及在其开发过程中生成的大量分类片段库,构成了在数据驱动的材料发现工作流程中使用该数据库的第一步。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验