分子指纹的可视化。

Visualization of molecular fingerprints.

机构信息

Nonlinearity and Complexity Research Group, Aston University, Aston Triangle, Birmingham B4 7ET, United Kingdom.

出版信息

J Chem Inf Model. 2011 Jul 25;51(7):1552-63. doi: 10.1021/ci1004042. Epub 2011 Jul 8.

Abstract

A visualization plot of a data set of molecular data is a useful tool for gaining insight into a set of molecules. In chemoinformatics, most visualization plots are of molecular descriptors, and the statistical model most often used to produce a visualization is principal component analysis (PCA). This paper takes PCA, together with four other statistical models (NeuroScale, GTM, LTM, and LTM-LIN), and evaluates their ability to produce clustering in visualizations not of molecular descriptors but of molecular fingerprints. Two different tasks are addressed: understanding structural information (particularly combinatorial libraries) and relating structure to activity. The quality of the visualizations is compared both subjectively (by visual inspection) and objectively (with global distance comparisons and local k-nearest-neighbor predictors). On the data sets used to evaluate clustering by structure, LTM is found to perform significantly better than the other models. In particular, the clusters in LTM visualization space are consistent with the relationships between the core scaffolds that define the combinatorial sublibraries. On the data sets used to evaluate clustering by activity, LTM again gives the best performance but by a smaller margin. The results of this paper demonstrate the value of using both a nonlinear projection map and a Bernoulli noise model for modeling binary data.

摘要

数据集的分子数据可视化图是深入了解一组分子的有用工具。在化学信息学中，大多数可视化图都是分子描述符，最常用于生成可视化图的统计模型通常是主成分分析（PCA）。本文采用 PCA 以及其他四个统计模型（NeuroScale、GTM、LTM 和 LTM-LIN），评估它们在不基于分子描述符、而是基于分子指纹的可视化图中产生聚类的能力。本文解决了两个不同的任务：理解结构信息（特别是组合库）和将结构与活性相关联。通过主观（通过视觉检查）和客观（通过全局距离比较和局部 k-最近邻预测器）比较了可视化图的质量。在所使用的数据集中，LTM 在评估结构聚类方面的表现明显优于其他模型。特别是，LTM 可视化空间中的聚类与定义组合子库的核心支架之间的关系一致。在所使用的数据集中，LTM 再次给出了最佳性能，但差距较小。本文的结果表明，使用非线性投影图和伯努利噪声模型对二进制数据进行建模具有价值。

相似文献

Visualization of molecular fingerprints.

J Chem Inf Model. 2011 Jul 25;51(7):1552-63. doi: 10.1021/ci1004042. Epub 2011 Jul 8.

Nonlinear dimensionality reduction and mapping of compound libraries for drug discovery.

J Mol Graph Model. 2012 Apr;34:108-17. doi: 10.1016/j.jmgm.2011.12.006. Epub 2012 Jan 2.

Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors.

J Chem Inf Model. 2011 Dec 27;51(12):3036-49. doi: 10.1021/ci2000083. Epub 2011 Dec 9.

Data visualization during the early stages of drug discovery.

J Chem Inf Model. 2006 Jul-Aug;46(4):1806-18. doi: 10.1021/ci050471a.

A comprehensive support vector machine binary hERG classification model based on extensive but biased end point hERG data sets.

Chem Res Toxicol. 2011 Jun 20;24(6):934-49. doi: 10.1021/tx200099j. Epub 2011 May 6.

Combinatorial QSAR modeling of specificity and subtype selectivity of ligands binding to serotonin receptors 5HT1E and 5HT1F.

J Chem Inf Model. 2008 May;48(5):997-1013. doi: 10.1021/ci700404c. Epub 2008 May 10.

Visualization of high-dimensional combinatorial catalysis data.

J Comb Chem. 2009 May-Jun;11(3):385-92. doi: 10.1021/cc800194j.

Generative topographic mapping applied to clustering and visualization of motor unit action potentials.

Biosystems. 2005 Dec;82(3):273-84. doi: 10.1016/j.biosystems.2005.09.004. Epub 2005 Oct 19.

Supervised self-organizing maps in drug discovery. 1. Robust behavior with overdetermined data sets.

J Chem Inf Model. 2005 Nov-Dec;45(6):1749-58. doi: 10.1021/ci0500839.

Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase.

J Chem Inf Model. 2011 Jul 25;51(7):1582-92. doi: 10.1021/ci200123y. Epub 2011 Jun 20.

引用本文的文献

Discovery of Active Ingredient of Yinchenhao Decoction Targeting TLR4 for Hepatic Inflammatory Diseases Based on Deep Learning Approach.

Interdiscip Sci. 2025 Jun;17(2):293-305. doi: 10.1007/s12539-024-00670-7. Epub 2024 Nov 19.

Scaffold and Structural Diversity of the Secondary Metabolite Space of Medicinal Fungi.

ACS Omega. 2023 Jan 10;8(3):3102-3113. doi: 10.1021/acsomega.2c06428. eCollection 2023 Jan 24.

Discovery of novel chemical reactions by deep generative recurrent neural network.

Sci Rep. 2021 Feb 4;11(1):3178. doi: 10.1038/s41598-021-81889-y.

Chemistry in Times of Artificial Intelligence.

Chemphyschem. 2020 Oct 16;21(20):2233-2242. doi: 10.1002/cphc.202000518. Epub 2020 Sep 28.

VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder.

Molecules. 2020 Jul 29;25(15):3446. doi: 10.3390/molecules25153446.

A Novel Discovery: Holistic Efficacy at the Special Organ Level of Pungent Flavored Compounds from Pungent Traditional Chinese Medicine.

Int J Mol Sci. 2019 Feb 11;20(3):752. doi: 10.3390/ijms20030752.

Distributed Representation of Chemical Fragments.

ACS Omega. 2018 Mar 31;3(3):2825-2836. doi: 10.1021/acsomega.7b02045. Epub 2018 Mar 8.

Machine learning in chemoinformatics and drug discovery.

Drug Discov Today. 2018 Aug;23(8):1538-1546. doi: 10.1016/j.drudis.2018.05.010. Epub 2018 May 8.

Cheminformatic characterization of natural products from Panama.

Mol Divers. 2017 Nov;21(4):779-789. doi: 10.1007/s11030-017-9781-4. Epub 2017 Aug 22.

Predictive cartography of metal binders using generative topographic mapping.

J Comput Aided Mol Des. 2017 Aug;31(8):701-714. doi: 10.1007/s10822-017-0033-6. Epub 2017 Jul 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

分子指纹的可视化。

Visualization of molecular fingerprints.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献