• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用原子环境描述符(MOLPRINT 2D)对化学数据库进行相似性搜索:性能评估

Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance.

作者信息

Bender Andreas, Mussa Hamse Y, Glen Robert C, Reiling Stephan

机构信息

Unilever Centre for Molecular Science Informatics, Chemistry Department, University of Cambridge, Cambridge CB2 1EW, United Kingdom.

出版信息

J Chem Inf Comput Sci. 2004 Sep-Oct;44(5):1708-18. doi: 10.1021/ci0498719.

DOI:10.1021/ci0498719
PMID:15446830
Abstract

A molecular similarity searching technique based on atom environments, information-gain-based feature selection, and the naive Bayesian classifier has been applied to a series of diverse datasets and its performance compared to those of alternative searching methods. Atom environments are count vectors of heavy atoms present at a topological distance from each heavy atom of a molecular structure. In this application, using a recently published dataset of more than 100000 molecules from the MDL Drug Data Report database, the atom environment approach appears to outperform fusion of ranking scores as well as binary kernel discrimination, which are both used in combination with Unity fingerprints. Overall retrieval rates among the top 5% of the sorted library are nearly 10% better (more than 14% better in relative numbers) than those of the second best method, Unity fingerprints and binary kernel discrimination. In 10 out of 11 sets of active compounds the combination of atom environments and the naive Bayesian classifier appears to be the superior method, while in the remaining dataset, data fusion and binary kernel discrimination in combination with Unity fingerprints is the method of choice. Binary kernel discrimination in combination with Unity fingerprints generally comes second in performance overall. The difference in performance can largely be attributed to the different molecular descriptors used. Atom environments outperform Unity fingerprints by a large margin if the combination of these descriptors with the Tanimoto coefficient is compared. The naive Bayesian classifier in combination with information-gain-based feature selection and selection of a sensible number of features performs about as well as binary kernel discrimination in experiments where these classification methods are compared. When used on a monoaminooxidase dataset, atom environments and the naive Bayesian classifier perform as well as binary kernel discrimination in the case of a 50/50 split of training and test compounds. In the case of sparse training data, binary kernel discrimination is found to be superior on this particular dataset. On a third dataset, the atom environment descriptor shows higher retrieval rates than other 2D fingerprints tested here when used in combination with the Tanimoto similarity coefficient. Feature selection is shown to be a crucial step in determining the performance of the algorithm. The representation of molecules by atom environments is found to be more effective than Unity fingerprints for the type of biological receptor similarity calculations examined here. Combining information prior to scoring and including information about inactive compounds, as in the Bayesian classifier and binary kernel discrimination, is found to be superior to posterior data fusion (in the datasets tested here).

摘要

一种基于原子环境、基于信息增益的特征选择和朴素贝叶斯分类器的分子相似性搜索技术已应用于一系列不同的数据集,并将其性能与其他搜索方法进行了比较。原子环境是指在分子结构中与每个重原子存在拓扑距离的重原子的计数向量。在本应用中,使用最近发布的来自MDL药物数据报告数据库的超过100,000个分子的数据集,原子环境方法似乎优于排名分数融合以及二元核判别,这两种方法都与Unity指纹结合使用。在排序库的前5%中,总体检索率比次优方法Unity指纹和二元核判别高出近10%(相对数字高出超过14%)。在11组活性化合物中的10组中,原子环境与朴素贝叶斯分类器的组合似乎是 superior 方法,而在其余数据集中,数据融合和与Unity指纹结合的二元核判别是首选方法。与Unity指纹结合的二元核判别在整体性能上通常排名第二。性能差异在很大程度上可归因于所使用的不同分子描述符。如果将这些描述符与Tanimoto系数的组合进行比较,原子环境在很大程度上优于Unity指纹。在比较这些分类方法的实验中,与基于信息增益的特征选择和合理数量特征选择相结合的朴素贝叶斯分类器的性能与二元核判别大致相同。当用于单胺氧化酶数据集时,在训练和测试化合物按50/50分割的情况下,原子环境和朴素贝叶斯分类器的性能与二元核判别相同。在稀疏训练数据的情况下,发现在这个特定数据集上二元核判别更 superior 。在第三个数据集上,当与Tanimoto相似系数结合使用时,原子环境描述符显示出比这里测试的其他二维指纹更高的检索率。特征选择被证明是确定算法性能的关键步骤。对于此处研究的生物受体相似性计算类型,发现用原子环境表示分子比Unity指纹更有效。如在贝叶斯分类器和二元核判别中那样,在评分前组合信息并包括有关非活性化合物的信息,被发现优于后验数据融合(在此处测试的数据集中)。

相似文献

1
Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance.使用原子环境描述符(MOLPRINT 2D)对化学数据库进行相似性搜索:性能评估
J Chem Inf Comput Sci. 2004 Sep-Oct;44(5):1708-18. doi: 10.1021/ci0498719.
2
Molecular similarity searching using atom environments, information-based feature selection, and a naïve Bayesian classifier.使用原子环境进行分子相似性搜索、基于信息的特征选择和朴素贝叶斯分类器。
J Chem Inf Comput Sci. 2004 Jan-Feb;44(1):170-8. doi: 10.1021/ci034207y.
3
Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and molecular fingerprints.结合性质描述符和分子指纹的高维化学空间中活性化合物的贝叶斯筛选
Chem Biol Drug Des. 2008 Jan;71(1):8-14. doi: 10.1111/j.1747-0285.2007.00602.x. Epub 2007 Dec 7.
4
How similar are similarity searching methods? A principal component analysis of molecular descriptor space.相似性搜索方法的相似程度如何?分子描述符空间的主成分分析。
J Chem Inf Model. 2009 Jan;49(1):108-19. doi: 10.1021/ci800249s.
5
Similarity metrics for ligands reflecting the similarity of the target proteins.反映靶蛋白相似性的配体相似性度量。
J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):391-405. doi: 10.1021/ci025569t.
6
Molecular surface point environments for virtual screening and the elucidation of binding patterns (MOLPRINT 3D).用于虚拟筛选和阐明结合模式的分子表面点环境(MOLPRINT 3D)。
J Med Chem. 2004 Dec 16;47(26):6569-83. doi: 10.1021/jm049611i.
7
Searching for target-selective compounds using different combinations of multiclass support vector machine ranking methods, kernel functions, and fingerprint descriptors.使用多类支持向量机排序方法、核函数和指纹描述符的不同组合来搜索目标选择性化合物。
J Chem Inf Model. 2009 Mar;49(3):582-92. doi: 10.1021/ci800441c.
8
How do 2D fingerprints detect structurally diverse active compounds? Revealing compound subset-specific fingerprint features through systematic selection.2D 指纹如何检测结构多样的活性化合物?通过系统选择揭示化合物子集特异性指纹特征。
J Chem Inf Model. 2011 Sep 26;51(9):2254-65. doi: 10.1021/ci200275m. Epub 2011 Aug 8.
9
Virtual screening using binary kernel discrimination: effect of noisy training data and the optimization of performance.使用二元核判别法的虚拟筛选:噪声训练数据的影响及性能优化
J Chem Inf Model. 2006 Mar-Apr;46(2):478-86. doi: 10.1021/ci0505426.
10
Comparison of similarity coefficients for clustering and compound selection.用于聚类和化合物选择的相似系数比较。
J Chem Inf Model. 2008 Mar;48(3):498-508. doi: 10.1021/ci700413a. Epub 2008 Feb 23.

引用本文的文献

1
Discovery of Sphingosine-1-Phosphate Receptor Modulators as Potential CHI3L1 Inhibitors by Ligand-Based Virtual Screening and Molecular Dynamics Simulations.基于配体的虚拟筛选和分子动力学模拟发现鞘氨醇-1-磷酸受体调节剂作为潜在的几丁质酶3样蛋白1抑制剂
ACS Omega. 2025 May 6;10(19):19992-20000. doi: 10.1021/acsomega.5c01968. eCollection 2025 May 20.
2
Molecular similarity: Theory, applications, and perspectives.分子相似性:理论、应用与展望。
Artif Intell Chem. 2024 Dec;2(2). doi: 10.1016/j.aichem.2024.100077. Epub 2024 Aug 31.
3
Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data.
利用标准参考数据集结合实际数据,通过神经网络对有机分子进行红外光谱分析。
J Cheminform. 2025 Feb 26;17(1):24. doi: 10.1186/s13321-025-00960-2.
4
Advancing efficiency in deep-blue OLEDs: Exploring a machine learning-driven multiresonance TADF molecular design.提高深蓝色有机发光二极管的效率:探索机器学习驱动的多共振热激活延迟荧光分子设计。
Sci Adv. 2025 Jan 24;11(4):eadr1326. doi: 10.1126/sciadv.adr1326. Epub 2025 Jan 22.
5
Discovery of new antimicrobial thiophene derivatives with activity against drug-resistant Gram negative-bacteria.发现对耐药革兰氏阴性菌具有活性的新型抗菌噻吩衍生物。
Front Pharmacol. 2024 Aug 20;15:1412797. doi: 10.3389/fphar.2024.1412797. eCollection 2024.
6
Effectiveness of molecular fingerprints for exploring the chemical space of natural products.分子指纹图谱在探索天然产物化学空间方面的有效性。
J Cheminform. 2024 Mar 25;16(1):35. doi: 10.1186/s13321-024-00830-3.
7
Drug-target affinity prediction with extended graph learning-convolutional networks.基于扩展图学习卷积网络的药物-靶标亲和力预测。
BMC Bioinformatics. 2024 Feb 16;25(1):75. doi: 10.1186/s12859-024-05698-6.
8
Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery.变革药物化学:人工智能在早期药物发现中的应用。
Pharmaceuticals (Basel). 2023 Sep 6;16(9):1259. doi: 10.3390/ph16091259.
9
Artificial Intelligence and Machine Learning Technology Driven Modern Drug Discovery and Development.人工智能和机器学习技术推动现代药物发现和开发。
Int J Mol Sci. 2023 Jan 19;24(3):2026. doi: 10.3390/ijms24032026.
10
A Guide to In Silico Drug Design.计算机辅助药物设计指南。
Pharmaceutics. 2022 Dec 23;15(1):49. doi: 10.3390/pharmaceutics15010049.