• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过一种对相似功能分子片段进行分类的分层指纹方案,对光吸收有机分子数据库进行聚类。

Clustering a database of optically absorbing organic molecules via a hierarchical fingerprint scheme that categorizes similar functional molecular fragments.

作者信息

Flanagan Padraic J, Cole Jacqueline M

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, United Kingdom.

出版信息

J Chem Phys. 2022 Apr 21;156(15):154110. doi: 10.1063/5.0087603.

DOI:10.1063/5.0087603
PMID:35459320
Abstract

A measure of chemical similarity is only useful if it implies similarity in some relevant property space. Typically, similarity calculations operate by assigning each molecule a chemical fingerprint: a fixed-length vector of bits where the on-bits signify the presence of a certain feature. Common fingerprinting schemes, such as extended-connectivity fingerprints, are by definition general and fail to capture much of the domain-specific theory that underpins similarity in a specific domain. In this work, a hierarchical fingerprinting scheme is developed that is bespoke to a database of ∼4500 organic molecules and their cognate optical absorption spectral properties. Our fingerprinting scheme incorporates molecular fragmentation and domain-specific chemical intuition into an algorithm that categorizes each fragment as being one of a core chemical group, a substituent, or a bridge. The algorithm is applied to every molecule in the database to generate a pool of chemically relevant fragments that are labeled according to their structural category. The fingerprint of each molecule is then composed of a nested Python dictionary specifying the unique identifiers of its constituent fragment entities and the structural links between them to give a hierarchical molecular encoding scheme. Four case studies show the application of our fingerprinting scheme to the subject database. In each case, the clustered molecules display a host of interesting chemical trends. The application that was used to develop and implement this bespoke fingerprinting scheme, referred to as ChemCluster, also exposes a host of other cheminformatics tools pertaining to this database, a selection of which is demonstrated in this work. The enhanced similarity comparisons afforded by our fingerprinting scheme, as well as the large repository of categorized fragments generated during its development, constitute the first step toward using this database in a data-driven materials discovery workflow.

摘要

只有当化学相似性度量意味着在某些相关属性空间中具有相似性时,它才有用。通常,相似性计算通过为每个分子分配一个化学指纹来进行:一个固定长度的位向量,其中的置位表示特定特征的存在。常见的指纹方案,如扩展连接性指纹,从定义上讲是通用的,无法捕捉到支撑特定领域相似性的许多领域特定理论。在这项工作中,开发了一种分层指纹方案,该方案是针对一个包含约4500个有机分子及其相关光学吸收光谱特性的数据库定制的。我们的指纹方案将分子片段化和领域特定的化学直觉纳入一种算法,该算法将每个片段分类为核心化学基团、取代基或桥接基团之一。该算法应用于数据库中的每个分子,以生成一组根据其结构类别进行标记的化学相关片段。然后,每个分子的指纹由一个嵌套的Python字典组成,该字典指定其组成片段实体的唯一标识符以及它们之间的结构链接,从而给出一种分层分子编码方案。四个案例研究展示了我们的指纹方案在主题数据库中的应用。在每种情况下,聚类的分子都显示出许多有趣的化学趋势。用于开发和实施这种定制指纹方案的应用程序,称为ChemCluster,还展示了与该数据库相关的许多其他化学信息学工具,本文展示了其中的一部分。我们的指纹方案提供的增强相似性比较,以及在其开发过程中生成的大量分类片段库,构成了在数据驱动的材料发现工作流程中使用该数据库的第一步。

相似文献

1
Clustering a database of optically absorbing organic molecules via a hierarchical fingerprint scheme that categorizes similar functional molecular fragments.通过一种对相似功能分子片段进行分类的分层指纹方案,对光吸收有机分子数据库进行聚类。
J Chem Phys. 2022 Apr 21;156(15):154110. doi: 10.1063/5.0087603.
2
A format for databasing and comparison of AFLP fingerprint profiles.一种用于AFLP指纹图谱数据库建立及比较的格式。
BMC Bioinformatics. 2003 Feb 25;4:7. doi: 10.1186/1471-2105-4-7.
3
Large-scale similarity search profiling of ChEMBL compound data sets.大规模相似性搜索分析 ChEMBL 化合物数据集。
J Chem Inf Model. 2011 Aug 22;51(8):1831-9. doi: 10.1021/ci200199u. Epub 2011 Jul 14.
4
Robust fingerprinting of genomic databases.基因组数据库的稳健指纹识别。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i143-i152. doi: 10.1093/bioinformatics/btac243.
5
Prediction of plasma protein binding of drugs using Kier-Hall valence connectivity indices and 4D-fingerprint molecular similarity analyses.
J Comput Aided Mol Des. 2005 Aug;19(8):567-83. doi: 10.1007/s10822-005-9012-4. Epub 2005 Nov 3.
6
Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme.一种涉及将性质描述符值转化为二元分类方案的分子指纹的设计与评估。
J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1151-7. doi: 10.1021/ci030285+.
7
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉:对蛋白质结构自动分类及网络的见解。
PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.
8
Random reduction in fingerprint bit density improves compound recall in search calculations using complex reference molecules.在使用复杂参考分子的搜索计算中,随机降低指纹位密度可提高化合物召回率。
Chem Biol Drug Des. 2008 Jun;71(6):511-7. doi: 10.1111/j.1747-0285.2008.00664.x. Epub 2008 May 7.
9
jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints.jCompoundMapper:一个用于化学指纹的开源 Java 库和命令行工具。
J Cheminform. 2011 Jan 10;3(1):3. doi: 10.1186/1758-2946-3-3.
10
Accelerating chemical database searching using graphics processing units.利用图形处理单元加速化学数据库搜索。
J Chem Inf Model. 2011 Aug 22;51(8):1807-16. doi: 10.1021/ci200164g. Epub 2011 Jul 13.

引用本文的文献

1
Automatic Prediction of Peak Optical Absorption Wavelengths in Molecules Using Convolutional Neural Networks.使用卷积神经网络自动预测分子中的峰值光吸收波长
J Chem Inf Model. 2024 Mar 11;64(5):1486-1501. doi: 10.1021/acs.jcim.3c01792. Epub 2024 Feb 29.