• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

扩展相似性指数:同时比较两个以上对象的益处。第2部分:速度、一致性、多样性选择。

Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection.

作者信息

Miranda-Quintana Ramón Alain, Rácz Anita, Bajusz Dávid, Héberger Károly

机构信息

Department of Chemistry, University of Florida, Gainesville, FL, 32603, USA.

Plasma Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary.

出版信息

J Cheminform. 2021 Apr 23;13(1):33. doi: 10.1186/s13321-021-00504-4.

DOI:10.1186/s13321-021-00504-4
PMID:33892799
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8067665/
Abstract

Despite being a central concept in cheminformatics, molecular similarity has so far been limited to the simultaneous comparison of only two molecules at a time and using one index, generally the Tanimoto coefficent. In a recent contribution we have not only introduced a complete mathematical framework for extended similarity calculations, (i.e. comparisons of more than two molecules at a time) but defined a series of novel idices. Part 1 is a detailed analysis of the effects of various parameters on the similarity values calculated by the extended formulas. Their features were revealed by sum of ranking differences and ANOVA. Here, in addition to characterizing several important aspects of the newly introduced similarity metrics, we will highlight their applicability and utility in real-life scenarios using datasets with popular molecular fingerprints. Remarkably, for large datasets, the use of extended similarity measures provides an unprecedented speed-up over "traditional" pairwise similarity matrix calculations. We also provide illustrative examples of a more direct algorithm based on the extended Tanimoto similarity to select diverse compound sets, resulting in much higher levels of diversity than traditional approaches. We discuss the inner and outer consistency of our indices, which are key in practical applications, showing whether the n-ary and binary indices rank the data in the same way. We demonstrate the use of the new n-ary similarity metrics on t-distributed stochastic neighbor embedding (t-SNE) plots of datasets of varying diversity, or corresponding to ligands of different pharmaceutical targets, which show that our indices provide a better measure of set compactness than standard binary measures. We also present a conceptual example of the applicability of our indices in agglomerative hierarchical algorithms. The Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons.

摘要

尽管分子相似性是化学信息学中的核心概念,但迄今为止,它仅限于一次仅对两个分子进行同时比较,并使用一个指标,通常是塔尼莫托系数。在最近的一篇论文中,我们不仅引入了用于扩展相似性计算(即一次比较两个以上分子)的完整数学框架,还定义了一系列新指标。第1部分详细分析了各种参数对扩展公式计算出的相似性值的影响。通过排名差异总和和方差分析揭示了它们的特征。在这里,除了描述新引入的相似性度量的几个重要方面外,我们还将使用具有流行分子指纹的数据集,突出它们在实际场景中的适用性和实用性。值得注意的是,对于大型数据集,使用扩展相似性度量比“传统”成对相似性矩阵计算提供了前所未有的加速。我们还提供了一个基于扩展塔尼莫托相似性的更直接算法的示例,用于选择不同的化合物集,从而产生比传统方法更高的多样性水平。我们讨论了我们的指标的内部和外部一致性,这在实际应用中至关重要,展示了n元指标和二元指标对数据的排名方式是否相同。我们展示了新的n元相似性度量在不同多样性数据集或对应于不同药物靶点配体的t分布随机邻域嵌入(t-SNE)图上的使用,这表明我们的指标比标准二元度量提供了更好的集合紧凑性度量。我们还给出了我们的指标在凝聚层次算法中的适用性的概念示例。用于计算扩展相似性度量的Python代码可在以下网址免费获取:https://github.com/ramirandaq/MultipleComparisons 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/0b2d98aadce2/13321_2021_504_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/6e8f364fc0e9/13321_2021_504_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/ed432a14ba87/13321_2021_504_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/0fcdf695d78e/13321_2021_504_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/b48f9fa78811/13321_2021_504_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/1d8de53c3241/13321_2021_504_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/80f79bae24e3/13321_2021_504_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/a50fcfd8e9ea/13321_2021_504_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/2c84dc2e0d08/13321_2021_504_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/0b2d98aadce2/13321_2021_504_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/6e8f364fc0e9/13321_2021_504_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/ed432a14ba87/13321_2021_504_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/0fcdf695d78e/13321_2021_504_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/b48f9fa78811/13321_2021_504_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/1d8de53c3241/13321_2021_504_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/80f79bae24e3/13321_2021_504_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/a50fcfd8e9ea/13321_2021_504_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/2c84dc2e0d08/13321_2021_504_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef1e/8067665/0b2d98aadce2/13321_2021_504_Fig9_HTML.jpg

相似文献

1
Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection.扩展相似性指数:同时比较两个以上对象的益处。第2部分:速度、一致性、多样性选择。
J Cheminform. 2021 Apr 23;13(1):33. doi: 10.1186/s13321-021-00504-4.
2
Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics.扩展相似性指数:同时比较两个以上对象的益处。第1部分:理论与特征。
J Cheminform. 2021 Apr 23;13(1):32. doi: 10.1186/s13321-021-00505-3.
3
Extended many-item similarity indices for sets of nucleotide and protein sequences.针对核苷酸和蛋白质序列集的扩展多项目相似性指数。
Comput Struct Biotechnol J. 2021 Jun 16;19:3628-3639. doi: 10.1016/j.csbj.2021.06.021. eCollection 2021.
4
Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?为什么田本系数是基于指纹的相似性计算的合适选择?
J Cheminform. 2015 May 20;7:20. doi: 10.1186/s13321-015-0069-3. eCollection 2015.
5
Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints.超越谷本系数的生命:相互作用指纹的相似性度量
J Cheminform. 2018 Oct 4;10(1):48. doi: 10.1186/s13321-018-0302-y.
6
Extended continuous similarity indices: theory and application for QSAR descriptor selection.扩展连续相似性指数:QSAR 描述符选择的理论与应用。
J Comput Aided Mol Des. 2022 Mar;36(3):157-173. doi: 10.1007/s10822-022-00444-7. Epub 2022 Mar 15.
7
Exploring activity landscapes with extended similarity: is Tanimoto enough?用扩展相似度探索活动景观:Tanimoto 足够吗?
Mol Inform. 2023 Jul;42(7):e2300056. doi: 10.1002/minf.202300056. Epub 2023 Jun 7.
8
Differential Consistency Analysis: Which Similarity Measures can be Applied in Drug Discovery?差异一致性分析:药物发现中可应用哪些相似性度量指标?
Mol Inform. 2021 Jul;40(7):e2060017. doi: 10.1002/minf.202060017. Epub 2021 Apr 23.
9
Molecular Dynamics Simulations and Diversity Selection by Extended Continuous Similarity Indices.分子动力学模拟与通过扩展连续相似性指数进行的多样性选择。
J Chem Inf Model. 2022 Jul 25;62(14):3415-3425. doi: 10.1021/acs.jcim.2c00433. Epub 2022 Jul 14.
10
iSIM: instant similarity.iSIM:即时相似度。
Digit Discov. 2024 May 7;3(6):1160-1171. doi: 10.1039/d4dd00041b. eCollection 2024 Jun 12.

引用本文的文献

1
Undersampling techniques for non-linear chemical space visualization.用于非线性化学空间可视化的欠采样技术。
bioRxiv. 2025 Jul 7:2025.07.03.663077. doi: 10.1101/2025.07.03.663077.
2
Scaling -Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations.数百万帧的缩放方法:一种用于大规模分子动力学模拟的分层非自适应邻居搜索方法
bioRxiv. 2025 Jun 18:2025.06.15.659780. doi: 10.1101/2025.06.15.659780.
3
iCliff Taylor's Version: Robust and Efficient Activity Cliff Determination.iCliff泰勒版本:稳健且高效的活性悬崖判定

本文引用的文献

1
Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics.扩展相似性指数:同时比较两个以上对象的益处。第1部分:理论与特征。
J Cheminform. 2021 Apr 23;13(1):32. doi: 10.1186/s13321-021-00505-3.
2
Differential Consistency Analysis: Which Similarity Measures can be Applied in Drug Discovery?差异一致性分析:药物发现中可应用哪些相似性度量指标?
Mol Inform. 2021 Jul;40(7):e2060017. doi: 10.1002/minf.202060017. Epub 2021 Apr 23.
3
An electrophilic warhead library for mapping the reactivity and accessibility of tractable cysteines in protein kinases.
J Chem Inf Model. 2025 Jun 9;65(11):5801-5810. doi: 10.1021/acs.jcim.5c00506. Epub 2025 May 21.
4
SHINE: Deterministic Many-to-Many Clustering of Molecular Pathways.SHINE:分子通路的确定性多对多聚类
J Chem Inf Model. 2025 May 26;65(10):4775-4782. doi: 10.1021/acs.jcim.5c00240. Epub 2025 May 6.
5
Extended Quality (eQual): Radial Threshold Clustering Based on -ary Similarity.扩展质量(eQual):基于 - 元相似度的径向阈值聚类
J Chem Inf Model. 2025 May 26;65(10):5062-5070. doi: 10.1021/acs.jcim.4c02341. Epub 2025 May 1.
6
Hierarchical Extended Linkage Method (HELM)'s Deep Dive into Hybrid Clustering Strategies.分层扩展链接方法(HELM)对混合聚类策略的深入研究。
bioRxiv. 2025 Mar 10:2025.03.05.641742. doi: 10.1101/2025.03.05.641742.
7
iCliff Taylor's version: Robust and Efficient Activity Cliff Determination.iCliff泰勒版本:稳健且高效的活性悬崖判定
bioRxiv. 2025 Mar 13:2025.03.09.642269. doi: 10.1101/2025.03.09.642269.
8
Molecular similarity: Theory, applications, and perspectives.分子相似性:理论、应用与展望。
Artif Intell Chem. 2024 Dec;2(2). doi: 10.1016/j.aichem.2024.100077. Epub 2024 Aug 31.
9
BitBIRCH: efficient clustering of large molecular libraries.BitBIRCH:大型分子文库的高效聚类
Digit Discov. 2025 Mar 13;4(4):1042-1051. doi: 10.1039/d5dd00030k. eCollection 2025 Apr 9.
10
CADENCE: Clustering Algorithm - Density-based Exploration and Novelty Clustering with Efficiency.CADENCE:聚类算法——基于密度的探索与高效新颖性聚类
bioRxiv. 2025 Feb 28:2025.02.24.639863. doi: 10.1101/2025.02.24.639863.
一种用于绘制蛋白激酶中可及半胱氨酸反应性和可及性的亲电弹头库。
Eur J Med Chem. 2020 Dec 1;207:112836. doi: 10.1016/j.ejmech.2020.112836. Epub 2020 Sep 12.
4
Similar, or dissimilar, that is the question. How different are methods for comparison of compounds similarity?相似,还是不相似,这是个问题。比较化合物相似性的方法有何不同?
Comput Biol Chem. 2020 Oct;88:107367. doi: 10.1016/j.compbiolchem.2020.107367. Epub 2020 Aug 26.
5
Large-scale evaluation of cytochrome P450 2C9 mediated drug interaction potential with machine learning-based consensus modeling.基于机器学习的共识模型对细胞色素 P450 2C9 介导的药物相互作用潜力的大规模评估。
J Comput Aided Mol Des. 2020 Aug;34(8):831-839. doi: 10.1007/s10822-020-00308-y. Epub 2020 Mar 27.
6
The impact of binding site waters on the activity/selectivity trade-off of Janus kinase 2 (JAK2) inhibitors.结合位点水对 Janus 激酶 2(JAK2)抑制剂的活性/选择性权衡的影响。
Bioorg Med Chem. 2019 Apr 15;27(8):1497-1508. doi: 10.1016/j.bmc.2019.02.029. Epub 2019 Feb 16.
7
Statistical-based database fingerprint: chemical space dependent representation of compound databases.基于统计的数据库指纹:化合物数据库的化学空间依赖性表示。
J Cheminform. 2018 Nov 22;10(1):55. doi: 10.1186/s13321-018-0311-x.
8
ZINClick v.18: Expanding Chemical Space of 1,2,3-Triazoles.ZINClick v.18:扩展 1,2,3-三唑的化学空间。
J Chem Inf Model. 2019 May 28;59(5):1697-1702. doi: 10.1021/acs.jcim.8b00615. Epub 2018 Nov 27.
9
Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints.超越谷本系数的生命:相互作用指纹的相似性度量
J Cheminform. 2018 Oct 4;10(1):48. doi: 10.1186/s13321-018-0302-y.
10
Database fingerprint (DFP): an approach to represent molecular databases.数据库指纹(DFP):一种表示分子数据库的方法。
J Cheminform. 2017 Feb 6;9:9. doi: 10.1186/s13321-017-0195-1. eCollection 2017.