• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重新优化用于药物发现的分子描述符语言(MDL)键。

Reoptimization of MDL keys for use in drug discovery.

作者信息

Durant Joseph L, Leland Burton A, Henry Douglas R, Nourse James G

机构信息

MDL Information Systems, 14600 Catalina Street, San Leandro, California 94577, USA.

出版信息

J Chem Inf Comput Sci. 2002 Nov-Dec;42(6):1273-80. doi: 10.1021/ci010132r.

DOI:10.1021/ci010132r
PMID:12444722
Abstract

For a number of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity. Classification performance for a test data set of 957 compounds was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset containing 208 bits and 0.71 for a genetic algorithm optimized keyset containing 548 bits. We present an overview of the underlying technology supporting the definition of descriptors and the encoding of these descriptors into keysets. This technology allows definition of descriptors as combinations of atom properties, bond properties, and atomic neighborhoods at various topological separations as well as supporting a number of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodology developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning experiment highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.

摘要

多年来,MDL产品基于二维描述符公开了166位和960位的键集。这些键集最初是为子结构搜索而构建和优化的。我们报告了MDL键集性能的改进情况,这些键集经过重新优化后用于分子相似性分析。对于一个包含957种化合物的测试数据集,分类性能从166位键集的0.65和960位键集的0.67提高到了包含208位的意外值S/N修剪键集的0.71以及包含548位的遗传算法优化键集的0.71。我们概述了支持描述符定义以及将这些描述符编码为键集的基础技术。该技术允许将描述符定义为原子属性、键属性以及不同拓扑距离处的原子邻域的组合,还支持多种自定义描述符。然后这些描述符可用于在键集中设置一位或多位。我们构建了各种键集,并在对生物活性物质进行聚类时优化了它们的性能。使用Briem和Lessel开发的方法来衡量性能。“定向修剪”是通过基于随机选择、位的意外值或位的意外值S/N比从键集中消除位来进行的。随机修剪实验突出了对于长度超过1000位的键集,键集性能的不敏感性。与最初的预期相反,基于各个位的意外值进行修剪得到的键集表现不如随机修剪得到的键集。相比之下,发现基于意外值S/N比进行修剪能产生比随机修剪得到的键集性能更好的键集。我们还探索了使用遗传算法来选择最优键集。性能再次只是键集大小的一个弱函数,并且优化未能识别出单个全局最优键集。相反,可以生成多个同样最优的键集,它们所编码的描述符重叠相对较低。

相似文献

1
Reoptimization of MDL keys for use in drug discovery.重新优化用于药物发现的分子描述符语言(MDL)键。
J Chem Inf Comput Sci. 2002 Nov-Dec;42(6):1273-80. doi: 10.1021/ci010132r.
2
Selecting diversified compounds to build a tangible library for biological and biochemical assays.选择多样化的化合物来构建用于生物和生化测定的有形文库。
Molecules. 2010 Jul 23;15(7):5031-44. doi: 10.3390/molecules15075031.
3
Power keys: a novel class of topological descriptors based on exhaustive subgraph enumeration and their application in substructure searching.键合拓扑指数:一类基于穷举子图枚举的新型拓扑描述符及其在子结构搜索中的应用。
J Chem Inf Model. 2011 Nov 28;51(11):2843-51. doi: 10.1021/ci200282z. Epub 2011 Oct 18.
4
Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures.使用多个生物活性参考结构进行基于相似性的虚拟筛选时拓扑描述符的比较。
Org Biomol Chem. 2004 Nov 21;2(22):3256-66. doi: 10.1039/B409865J. Epub 2004 Sep 29.
5
Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients.二维基于片段的相似性搜索中相似性度量的性能:结构描述符和相似性系数的比较
J Chem Inf Comput Sci. 2002 Nov-Dec;42(6):1407-14. doi: 10.1021/ci025531g.
6
Development of CYP3A4 inhibition models: comparisons of machine-learning techniques and molecular descriptors.CYP3A4抑制模型的开发:机器学习技术与分子描述符的比较
J Biomol Screen. 2005 Apr;10(3):197-205. doi: 10.1177/1087057104274091.
7
Development of a fingerprint reduction approach for Bayesian similarity searching based on Kullback-Leibler divergence analysis.基于库尔贝克-莱布勒散度分析的贝叶斯相似性搜索指纹约简方法的开发。
J Chem Inf Model. 2009 Jun;49(6):1347-58. doi: 10.1021/ci900087y.
8
Profile scaling increases the similarity search performance of molecular fingerprints containing numerical descriptors and structural keys.轮廓缩放提高了包含数值描述符和结构键的分子指纹的相似性搜索性能。
J Chem Inf Comput Sci. 2003 Jul-Aug;43(4):1218-25. doi: 10.1021/ci030287u.
9
How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space.多样性评估方法有哪些差异?分子描述符空间的比较分析和基准测试。
J Chem Inf Model. 2014 Jan 27;54(1):230-42. doi: 10.1021/ci400469u. Epub 2013 Dec 13.
10
Discovering collectively informative descriptors from high-throughput experiments.从高通量实验中发现具有信息性的综合描述符。
BMC Bioinformatics. 2009 Dec 18;10:431. doi: 10.1186/1471-2105-10-431.

引用本文的文献

1
Hot-Spot-Guided Generative Deep Learning for Drug-Like PPI Inhibitor Design.用于类药物蛋白质-蛋白质相互作用抑制剂设计的热点引导生成式深度学习
Interdiscip Sci. 2025 Sep 2. doi: 10.1007/s12539-025-00756-w.
2
A Comparative Evaluation of Machine Learning and Deep Graph Learning for Chemical Ecotoxicological Prediction.机器学习与深度图学习用于化学生态毒理学预测的比较评估
ACS Omega. 2025 Aug 12;10(33):37549-37560. doi: 10.1021/acsomega.5c03753. eCollection 2025 Aug 26.
3
Biosynfoni: a biosynthesis-informed and interpretable lightweight molecular fingerprint.
生物合成信息与可解释的轻量级分子指纹:Biosynfoni
J Cheminform. 2025 Aug 29;17(1):136. doi: 10.1186/s13321-025-01081-6.
4
TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining.TapWeight:用于任务自适应预训练的重新加权预训练目标
Transact Mach Learn Res. 2025 Jun;2025.
5
SbD4Skin by EosCloud: Integrating multi-view molecular representation for predicting skin sensitization, irritation, and acute dermal toxicity.EosCloud公司的SbD4Skin:整合多视图分子表示法以预测皮肤致敏、刺激和急性皮肤毒性。
Comput Struct Biotechnol J. 2025 Aug 6;29:222-235. doi: 10.1016/j.csbj.2025.08.001. eCollection 2025.
6
Improving drug-induced liver injury prediction using graph neural networks with augmented graph features from molecular optimisation.利用具有分子优化增强图特征的图神经网络改善药物性肝损伤预测。
J Cheminform. 2025 Aug 18;17(1):124. doi: 10.1186/s13321-025-01068-3.
7
Qsarna: An Online Tool for Smart Chemical Space Navigation in Drug Design.Qsarna:药物设计中智能化学空间导航的在线工具。
J Chem Inf Model. 2025 Aug 11;65(15):7811-7816. doi: 10.1021/acs.jcim.5c00720. Epub 2025 Jul 29.
8
MCST-AFN: A Multichannel Spatiotemporal Feature Adaptive Fusion Network Framework Based on a Low-Fidelity Molecular Dynamics Model.MCST-AFN:一种基于低精度分子动力学模型的多通道时空特征自适应融合网络框架
ACS Omega. 2025 Jul 11;10(28):30232-30249. doi: 10.1021/acsomega.5c01443. eCollection 2025 Jul 22.
9
Stacking Ensemble Neural Network for Chemical Safety Assessment: A Case Study of Thyroid Peroxidase and Natural Product Screening.用于化学安全评估的堆叠集成神经网络:以甲状腺过氧化物酶和天然产物筛选为例
ACS Omega. 2025 Jul 10;10(28):30450-30466. doi: 10.1021/acsomega.5c02188. eCollection 2025 Jul 22.
10
The topology of molecular representations and its influence on machine learning performance.分子表示的拓扑结构及其对机器学习性能的影响。
J Cheminform. 2025 Jul 21;17(1):109. doi: 10.1186/s13321-025-01045-w.