• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

扩展相似性指数:同时比较两个以上对象的益处。第1部分:理论与特征。

Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics.

作者信息

Miranda-Quintana Ramón Alain, Bajusz Dávid, Rácz Anita, Héberger Károly

机构信息

Department of Chemistry, University of Florida, Gainesville, FL, 32603, USA.

Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Magyar tudósok krt. 2, 1117, Budapest, Hungary.

出版信息

J Cheminform. 2021 Apr 23;13(1):32. doi: 10.1186/s13321-021-00505-3.

DOI:10.1186/s13321-021-00505-3
PMID:33892802
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8067658/
Abstract

Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time. The present study bridges this gap, by introducing a straightforward computational framework for comparing multiple objects at the same time and providing extended formulas for as many similarity metrics as possible. In the binary case (i.e. when comparing two molecules pairwise) these are naturally reduced to their well-known formulas. We provide a detailed analysis on the effects of various parameters on the similarity values calculated by the extended formulas. The extended similarity indices are entirely general and do not depend on the fingerprints used. Two types of variance analysis (ANOVA) help to understand the main features of the indices: (i) ANOVA of mean similarity indices; (ii) ANOVA of sum of ranking differences (SRD). Practical aspects and applications of the extended similarity indices are detailed in the accompanying paper: Miranda-Quintana et al. J Cheminform. 2021. https://doi.org/10.1186/s13321-021-00504-4 . Python code for calculating the extended similarity metrics is freely available at: https://github.com/ramirandaq/MultipleComparisons .

摘要

对象相似性的量化是计算科学许多领域的关键概念。这包括化学信息学,其中分子相似性通常基于二进制指纹进行量化。虽然有多种可用的分子表示和相似性度量,但以前没有将相似性计算的计算框架扩展到同时比较两个以上对象(分子)的尝试。本研究弥补了这一差距,通过引入一个直接的计算框架来同时比较多个对象,并为尽可能多的相似性度量提供扩展公式。在二元情况下(即成对比较两个分子时),这些公式自然会简化为其众所周知的公式。我们详细分析了各种参数对扩展公式计算的相似性值的影响。扩展相似性指数完全通用,不依赖于所使用的指纹。两种方差分析(ANOVA)有助于理解这些指数的主要特征:(i)平均相似性指数的方差分析;(ii)排名差异总和(SRD)的方差分析。扩展相似性指数的实际方面和应用在随附论文中详细介绍:Miranda-Quintana等人,《化学信息学杂志》。2021年。https://doi.org/10.1186/s13321-021-00504-4 。用于计算扩展相似性度量的Python代码可在以下网址免费获取:https://github.com/ramirandaq/MultipleComparisons 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/6ba4f82a3648/13321_2021_505_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/ebc6ee1fef32/13321_2021_505_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/5fc09782872f/13321_2021_505_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/9e6854202113/13321_2021_505_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/2a4dd89acc8b/13321_2021_505_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/20393ed2b064/13321_2021_505_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/64e679f86b9b/13321_2021_505_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/daf2349585e6/13321_2021_505_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/cf3cb44092e8/13321_2021_505_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/c4cb899ac367/13321_2021_505_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/6c91177fb02a/13321_2021_505_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/6ba4f82a3648/13321_2021_505_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/ebc6ee1fef32/13321_2021_505_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/5fc09782872f/13321_2021_505_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/9e6854202113/13321_2021_505_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/2a4dd89acc8b/13321_2021_505_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/20393ed2b064/13321_2021_505_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/64e679f86b9b/13321_2021_505_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/daf2349585e6/13321_2021_505_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/cf3cb44092e8/13321_2021_505_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/c4cb899ac367/13321_2021_505_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/6c91177fb02a/13321_2021_505_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce33/8067658/6ba4f82a3648/13321_2021_505_Fig11_HTML.jpg

相似文献

1
Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics.扩展相似性指数:同时比较两个以上对象的益处。第1部分:理论与特征。
J Cheminform. 2021 Apr 23;13(1):32. doi: 10.1186/s13321-021-00505-3.
2
Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection.扩展相似性指数:同时比较两个以上对象的益处。第2部分:速度、一致性、多样性选择。
J Cheminform. 2021 Apr 23;13(1):33. doi: 10.1186/s13321-021-00504-4.
3
Extended many-item similarity indices for sets of nucleotide and protein sequences.针对核苷酸和蛋白质序列集的扩展多项目相似性指数。
Comput Struct Biotechnol J. 2021 Jun 16;19:3628-3639. doi: 10.1016/j.csbj.2021.06.021. eCollection 2021.
4
Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?为什么田本系数是基于指纹的相似性计算的合适选择?
J Cheminform. 2015 May 20;7:20. doi: 10.1186/s13321-015-0069-3. eCollection 2015.
5
Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints.超越谷本系数的生命:相互作用指纹的相似性度量
J Cheminform. 2018 Oct 4;10(1):48. doi: 10.1186/s13321-018-0302-y.
6
iSIM: instant similarity.iSIM:即时相似度。
Digit Discov. 2024 May 7;3(6):1160-1171. doi: 10.1039/d4dd00041b. eCollection 2024 Jun 12.
7
Extended continuous similarity indices: theory and application for QSAR descriptor selection.扩展连续相似性指数:QSAR 描述符选择的理论与应用。
J Comput Aided Mol Des. 2022 Mar;36(3):157-173. doi: 10.1007/s10822-022-00444-7. Epub 2022 Mar 15.
8
Comparing structural fingerprints using a literature-based similarity benchmark.使用基于文献的相似性基准比较结构指纹。
J Cheminform. 2016 Jul 5;8:36. doi: 10.1186/s13321-016-0148-0. eCollection 2016.
9
Molecular Dynamics Simulations and Diversity Selection by Extended Continuous Similarity Indices.分子动力学模拟与通过扩展连续相似性指数进行的多样性选择。
J Chem Inf Model. 2022 Jul 25;62(14):3415-3425. doi: 10.1021/acs.jcim.2c00433. Epub 2022 Jul 14.
10
Differential Consistency Analysis: Which Similarity Measures can be Applied in Drug Discovery?差异一致性分析:药物发现中可应用哪些相似性度量指标?
Mol Inform. 2021 Jul;40(7):e2060017. doi: 10.1002/minf.202060017. Epub 2021 Apr 23.

引用本文的文献

1
Undersampling techniques for non-linear chemical space visualization.用于非线性化学空间可视化的欠采样技术。
bioRxiv. 2025 Jul 7:2025.07.03.663077. doi: 10.1101/2025.07.03.663077.
2
Divide and Cluster: The DIVINE Framework for Deterministic Top-Down Analysis of Molecular Dynamics Trajectories.划分与聚类:用于分子动力学轨迹确定性自上而下分析的DIVINE框架。
bioRxiv. 2025 Jun 26:2025.06.20.660828. doi: 10.1101/2025.06.20.660828.
3
Scaling -Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations.

本文引用的文献

1
Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection.扩展相似性指数:同时比较两个以上对象的益处。第2部分:速度、一致性、多样性选择。
J Cheminform. 2021 Apr 23;13(1):33. doi: 10.1186/s13321-021-00504-4.
2
Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints.超越谷本系数的生命:相互作用指纹的相似性度量
J Cheminform. 2018 Oct 4;10(1):48. doi: 10.1186/s13321-018-0302-y.
3
Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.
数百万帧的缩放方法:一种用于大规模分子动力学模拟的分层非自适应邻居搜索方法
bioRxiv. 2025 Jun 18:2025.06.15.659780. doi: 10.1101/2025.06.15.659780.
4
iCliff Taylor's Version: Robust and Efficient Activity Cliff Determination.iCliff泰勒版本:稳健且高效的活性悬崖判定
J Chem Inf Model. 2025 Jun 9;65(11):5801-5810. doi: 10.1021/acs.jcim.5c00506. Epub 2025 May 21.
5
SHINE: Deterministic Many-to-Many Clustering of Molecular Pathways.SHINE:分子通路的确定性多对多聚类
J Chem Inf Model. 2025 May 26;65(10):4775-4782. doi: 10.1021/acs.jcim.5c00240. Epub 2025 May 6.
6
Extended Quality (eQual): Radial Threshold Clustering Based on -ary Similarity.扩展质量(eQual):基于 - 元相似度的径向阈值聚类
J Chem Inf Model. 2025 May 26;65(10):5062-5070. doi: 10.1021/acs.jcim.4c02341. Epub 2025 May 1.
7
Hierarchical Extended Linkage Method (HELM)'s Deep Dive into Hybrid Clustering Strategies.分层扩展链接方法(HELM)对混合聚类策略的深入研究。
bioRxiv. 2025 Mar 10:2025.03.05.641742. doi: 10.1101/2025.03.05.641742.
8
iCliff Taylor's version: Robust and Efficient Activity Cliff Determination.iCliff泰勒版本:稳健且高效的活性悬崖判定
bioRxiv. 2025 Mar 13:2025.03.09.642269. doi: 10.1101/2025.03.09.642269.
9
Molecular similarity: Theory, applications, and perspectives.分子相似性:理论、应用与展望。
Artif Intell Chem. 2024 Dec;2(2). doi: 10.1016/j.aichem.2024.100077. Epub 2024 Aug 31.
10
BitBIRCH: efficient clustering of large molecular libraries.BitBIRCH:大型分子文库的高效聚类
Digit Discov. 2025 Mar 13;4(4):1042-1051. doi: 10.1039/d5dd00030k. eCollection 2025 Apr 9.
利用迭代筛选从初始失活物中发现高活性分子。
J Chem Inf Model. 2018 Sep 24;58(9):2000-2014. doi: 10.1021/acs.jcim.8b00376. Epub 2018 Sep 10.
4
Binary similarity measures for fingerprint analysis of qualitative metabolomic profiles.用于定性代谢组学图谱指纹分析的二元相似性度量
Metabolomics. 2018;14(3):29. doi: 10.1007/s11306-018-1327-y. Epub 2018 Jan 31.
5
Feasibility Assessment of Synchronous Fluorescence Spectral Fusion by Application to Argan Oil for Adulteration Analysis.同步荧光光谱融合应用于阿甘油掺假分析的可行性评估
Appl Spectrosc. 2018 Mar;72(3):432-441. doi: 10.1177/0003702817749232. Epub 2017 Dec 4.
6
Multivariate assessment of lipophilicity scales-computational and reversed phase thin-layer chromatographic indices.亲脂性量表的多变量评估——计算和反相薄层色谱指标
J Pharm Biomed Anal. 2016 Aug 5;127:81-93. doi: 10.1016/j.jpba.2016.04.001. Epub 2016 Apr 1.
7
Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters.定量构效关系(QSAR)模型的一致性:训练集和测试集的正确划分、模型排名及性能参数
SAR QSAR Environ Res. 2015;26(7-9):683-700. doi: 10.1080/1062936X.2015.1084647. Epub 2015 Oct 5.
8
Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?为什么田本系数是基于指纹的相似性计算的合适选择?
J Cheminform. 2015 May 20;7:20. doi: 10.1186/s13321-015-0069-3. eCollection 2015.
9
Evaluation of single-cell gel electrophoresis data: combination of variance analysis with sum of ranking differences.单细胞凝胶电泳数据评估:方差分析与秩和差异的结合。
Mutat Res Genet Toxicol Environ Mutagen. 2014 Sep 1;771:15-22. doi: 10.1016/j.mrgentox.2014.04.028. Epub 2014 Jun 21.
10
QSAR modeling: where have you been? Where are you going to?定量构效关系模型:你从何处来?你将往何处去?
J Med Chem. 2014 Jun 26;57(12):4977-5010. doi: 10.1021/jm4004285. Epub 2014 Jan 6.