随机设计还是合理设计？对化学结构数据库中不同化合物子集的评估。

Random or rational design? Evaluation of diverse compound subsets from chemical structure databases.

作者信息

Pötter T, Matter H

机构信息

BAYER AG, Landwirtschaftszentrum, Monheim, Germany.

出版信息

J Med Chem. 1998 Feb 12;41(4):478-88. doi: 10.1021/jm9700878.

DOI:10.1021/jm9700878

PMID:9484498

Abstract

The performance of rational design to maximize the structural diversity of databases for lead finding and lead refinement was investigated. Rational methods such as maximum dissimilarity methods or hierarchical cluster analysis for designing compound subsets were compared to a random approach to study their efficiency for an enhancement of the diversity of three different databases. All investigations were done based on 2D fingerprints as a validated molecular descriptor. To compare the performance of the rational selection methods to a random approach, we additionally used probability calculations. When using maximum dissimilarity-based selections, a single compound can be a member of different neighborhoods as defined by the similarity threshold value, while in hierarchical clustering each compound is assigned to only a single cluster. Therefore the relationship between the similarity threshold of the maximum diversity selection method and a 2D similarity search threshold was studied. In contrast to hierarchical clustering analysis, maximum dissimilarity selections allow to use a similarity threshold for adding a new compound to an already selected compound list. Reasonable values for this similarity threshold are presented here. More diverse subsets were designed using maximum dissimilarity selections, which cover more biological classes than using random selections. An optimally diverse subset without redundant structures containing only 38% of one original dataset was generated, where no structure is more similar than 0.85 to its nearest neighbor, but all biological classes were represented. When it is acceptable to cover only 90% of all biological targets, 3.5-3.7 times more compounds need to be selected using a random approach than in a rational design approach. Such coverage rate shows the highest efficiency of design techniques compared to a random approach. In those subsets no compound is closer than 0.70 to its nearest neighbor. Furthermore a comparative molecular field analysis (CoMFA) is used to evaluate designed and randomly chosen subsets for a database consisting of inhibitors of the angiotensin-converting enzyme. It was shown that designed subsets using maximum dissimilarity methods lead to more stable quantitative structure-activity relationship (QSAR) models with higher predictive power compared to randomly chosen compounds. This predictive power is especially high when there is no compound in the test dataset with a similarity coefficient less than 0.7 to its nearest neighbor in the training set.

摘要

研究了合理设计在最大化用于先导化合物发现和优化的数据库结构多样性方面的表现。将用于设计化合物子集的合理方法（如最大差异方法或层次聚类分析）与随机方法进行比较，以研究它们增强三个不同数据库多样性的效率。所有研究均基于二维指纹作为经过验证的分子描述符进行。为了将合理选择方法的性能与随机方法进行比较，我们还使用了概率计算。当使用基于最大差异的选择时，单个化合物可以是由相似性阈值定义的不同邻域的成员，而在层次聚类中，每个化合物仅被分配到一个簇。因此，研究了最大多样性选择方法的相似性阈值与二维相似性搜索阈值之间的关系。与层次聚类分析不同，最大差异选择允许使用相似性阈值将新化合物添加到已选化合物列表中。这里给出了该相似性阈值的合理值。使用最大差异选择设计出了更多样化的子集，与随机选择相比，这些子集涵盖了更多的生物类别。生成了一个最优的多样化子集，其中不包含冗余结构，仅占一个原始数据集的38%，其中没有结构与其最近邻的相似度超过0.85，但所有生物类别均有代表。当只覆盖所有生物靶点的90%可接受时，与合理设计方法相比，使用随机方法需要选择的化合物数量多3.5 - 3.7倍。与随机方法相比，这种覆盖率显示了设计技术的最高效率。在这些子集中，没有化合物与其最近邻的距离小于0.70。此外，使用比较分子场分析（CoMFA）来评估由血管紧张素转换酶抑制剂组成的数据库的设计子集和随机选择的子集。结果表明，与随机选择的化合物相比，使用最大差异方法设计的子集能产生更稳定的定量构效关系（QSAR）模型，且具有更高的预测能力。当测试数据集中没有化合物与其训练集中最近邻的相似系数小于0.7时，这种预测能力尤其高。

相似文献

Random or rational design? Evaluation of diverse compound subsets from chemical structure databases.随机设计还是合理设计？对化学结构数据库中不同化合物子集的评估。

J Med Chem. 1998 Feb 12;41(4):478-88. doi: 10.1021/jm9700878.

Comparison of 2D fingerprint methods for multiple-template similarity searching on compound activity classes of increasing structural diversity.二维指纹方法在结构多样性不断增加的化合物活性类别上进行多模板相似性搜索的比较。

ChemMedChem. 2007 Feb;2(2):208-17. doi: 10.1002/cmdc.200600225.

Ligand-based virtual screening and in silico design of new antimalarial compounds using nonstochastic and stochastic total and atom-type quadratic maps.基于配体的虚拟筛选以及使用非随机和随机全原子型及原子类型二次映射的新型抗疟化合物的计算机辅助设计。

J Chem Inf Model. 2005 Jul-Aug;45(4):1082-100. doi: 10.1021/ci050085t.

Relationships between Molecular Complexity, Biological Activity, and Structural Diversity.分子复杂性、生物活性与结构多样性之间的关系

J Chem Inf Model. 2006 Mar-Apr;46(2):525-35. doi: 10.1021/ci0503558.

Designing compound subsets: comparison of random and rational approaches using statistical simulation.设计化合物子集：使用统计模拟对随机方法和合理方法进行比较

J Chem Inf Model. 2007 Nov-Dec;47(6):2149-58. doi: 10.1021/ci600382m. Epub 2007 Oct 6.

Quantitative structure-activity relationship of human neutrophil collagenase (MMP-8) inhibitors using comparative molecular field analysis and X-ray structure analysis.运用比较分子力场分析和X射线结构分析研究人中性粒细胞胶原酶（MMP - 8）抑制剂的定量构效关系。

J Med Chem. 1999 Jun 3;42(11):1908-20. doi: 10.1021/jm980631s.

Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds.预测性定量构效关系模型在数据库挖掘中的应用：新型抗惊厥化合物的鉴定与实验验证

J Med Chem. 2004 Apr 22;47(9):2356-64. doi: 10.1021/jm030584q.

Application of validated QSAR models of D1 dopaminergic antagonists for database mining.经验证的D1多巴胺能拮抗剂定量构效关系模型在数据库挖掘中的应用。

J Med Chem. 2005 Nov 17;48(23):7322-32. doi: 10.1021/jm049116m.

RelACCS-FP: a structural minimalist approach to fingerprint design.RelACCS-FP：一种指纹设计的结构极简主义方法。

Chem Biol Drug Des. 2008 Nov;72(5):341-9. doi: 10.1111/j.1747-0285.2008.00723.x.

Comparison of methods based on diversity and similarity for molecule selection and the analysis of drug discovery data.基于多样性和相似性的分子选择方法比较及药物发现数据分析

Methods Mol Biol. 2004;275:301-16. doi: 10.1385/1-59259-802-1:301.

引用本文的文献

Understanding Catalytic Enantioselective C-H Bond Oxidation at Nonactivated Methylenes Through Predictive Statistical Modeling Analysis.通过预测性统计建模分析理解非活化亚甲基上的催化对映选择性C-H键氧化反应

ACS Catal. 2025 Jan 22;15(3):2110-2123. doi: 10.1021/acscatal.4c05659. eCollection 2025 Feb 7.

Predicting relative efficiency of amide bond formation using multivariate linear regression.利用多元线性回归预测酰胺键形成的相对效率。

Proc Natl Acad Sci U S A. 2022 Apr 19;119(16):e2118451119. doi: 10.1073/pnas.2118451119. Epub 2022 Apr 11.

A Data-Driven Approach to the Development and Understanding of Chiroptical Sensors for Alcohols with Remote γ-Stereocenters.一种数据驱动的方法，用于开发和理解具有远程 γ-手性中心的手性传感器用于醇类。

J Am Chem Soc. 2021 Nov 17;143(45):19187-19198. doi: 10.1021/jacs.1c09443. Epub 2021 Nov 4.

A Multi-Objective Approach for Anti-Osteosarcoma Cancer Agents Discovery through Drug Repurposing.一种通过药物再利用发现抗骨肉瘤药物的多目标方法。

Pharmaceuticals (Basel). 2020 Nov 22;13(11):409. doi: 10.3390/ph13110409.

Toward the computer-aided discovery of FabH inhibitors. Do predictive QSAR models ensure high quality virtual screening performance?迈向计算机辅助发现法布H抑制剂。预测性定量构效关系模型能否确保高质量的虚拟筛选性能？

Mol Divers. 2014 Aug;18(3):637-54. doi: 10.1007/s11030-014-9513-y. Epub 2014 Mar 27.

Application of a sparse matrix design strategy to the synthesis of dos libraries.稀疏矩阵设计策略在剂量库合成中的应用。

ACS Comb Sci. 2011 Jul 11;13(4):357-64. doi: 10.1021/co200020j. Epub 2011 Apr 28.

Quantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collections.量化小分子筛选集中小分子集合的结构和性能多样性。

Proc Natl Acad Sci U S A. 2011 Apr 26;108(17):6817-22. doi: 10.1073/pnas.1015024108. Epub 2011 Apr 11.

Use of alignment-free molecular descriptors in diversity analysis and optimal sampling of molecular libraries.无比对分子描述符在分子库多样性分析和最优采样中的应用。

Mol Divers. 2003;6(2):135-47. doi: 10.1023/b:modi.0000006840.89805.e1.

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.基于实验数据集的多样性采样进行训练集和测试集选择的预测性定量构效关系建模。

Mol Divers. 2002;5(4):231-43. doi: 10.1023/a:1021372108686.

J Comput Aided Mol Des. 2002 May-Jun;16(5-6):357-69. doi: 10.1023/a:1020869118689.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

随机设计还是合理设计？对化学结构数据库中不同化合物子集的评估。

Random or rational design? Evaluation of diverse compound subsets from chemical structure databases.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献