• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

iSIM-sigma:用于分子相似性的高效标准差计算

iSIM-sigma: efficient standard deviation calculation for molecular similarity.

作者信息

Perez Kenneth Lopez, Zhao Bill, Quintana Ramon Alain Miranda

出版信息

bioRxiv. 2024 Nov 26:2024.11.24.625084. doi: 10.1101/2024.11.24.625084.

DOI:10.1101/2024.11.24.625084
PMID:39651185
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11623521/
Abstract

The average and variance of the molecular similarities in a set is high-value and useful information for cheminformatics tasks like chemical space exploration and subset selection. However, the calculation of the variance of the complete similarity matrix has a quadratic complexity, ( ). As the sizes of molecular libraries constantly increase, this pairwise approach is unfeasible. In this work, we present an alternative to obtaining the exact standard deviation of the molecular similarities in a set (with molecules and features) for the Russell-Rao (RR) and Sokal-Michener (SM) similarity indexes in ( ) complexity. Additionally, we present a highly accurate approximation with linear complexity, ( ), based on the sampling of representative molecules from the set. The proposed approximation can be extended to other similarity indexes, including the popular Jaccard-Tanimoto (JT). With only the sampling of 50 molecules, the proposed method can estimate the standard deviation of the similarities in a set with RMSE lower than 0.01 for sets of up to 50,000 molecules. In comparison, random sampling does not warrant a good approximation as shown in our results.

摘要

一组分子相似性的平均值和方差是化学信息学任务(如化学空间探索和子集选择)中的高价值且有用的信息。然而,完整相似性矩阵方差的计算具有二次复杂度( )。随着分子库规模不断增大,这种成对方法不可行。在这项工作中,我们提出了一种替代方法,对于( )复杂度下的Russell-Rao(RR)和Sokal-Michener(SM)相似性指数,可获得一组(有 个分子和 个特征)分子相似性的精确标准差。此外,我们基于从该组中对代表性分子进行采样,提出了一种具有线性复杂度( )的高精度近似方法。所提出的近似方法可扩展到其他相似性指数,包括流行的Jaccard-Tanimoto(JT)指数。对于多达50,000个分子的集合,仅通过对50个分子进行采样,所提出的方法就能以低于0.01的均方根误差(RMSE)估计该集合中相似性的标准差。相比之下,如我们的结果所示,随机采样无法保证良好的近似效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/a7bd2956f7de/nihpp-2024.11.24.625084v1-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/fab345631df2/nihpp-2024.11.24.625084v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/a6855fb833ce/nihpp-2024.11.24.625084v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/4a4fde439e82/nihpp-2024.11.24.625084v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/87e111a0b58a/nihpp-2024.11.24.625084v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/78290fe15850/nihpp-2024.11.24.625084v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/bab582958dd0/nihpp-2024.11.24.625084v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/beefa60715b9/nihpp-2024.11.24.625084v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/70c8a7892a66/nihpp-2024.11.24.625084v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/a7bd2956f7de/nihpp-2024.11.24.625084v1-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/fab345631df2/nihpp-2024.11.24.625084v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/a6855fb833ce/nihpp-2024.11.24.625084v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/4a4fde439e82/nihpp-2024.11.24.625084v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/87e111a0b58a/nihpp-2024.11.24.625084v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/78290fe15850/nihpp-2024.11.24.625084v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/bab582958dd0/nihpp-2024.11.24.625084v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/beefa60715b9/nihpp-2024.11.24.625084v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/70c8a7892a66/nihpp-2024.11.24.625084v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cd3/11623521/a7bd2956f7de/nihpp-2024.11.24.625084v1-f0009.jpg

相似文献

1
iSIM-sigma: efficient standard deviation calculation for molecular similarity.iSIM-sigma:用于分子相似性的高效标准差计算
bioRxiv. 2024 Nov 26:2024.11.24.625084. doi: 10.1101/2024.11.24.625084.
2
iSIM-Sigma: Efficient Standard Deviation Calculation for Molecular Similarity.iSIM-Sigma:用于分子相似性的高效标准差计算
J Chem Inf Model. 2025 Jul 14;65(13):6797-6808. doi: 10.1021/acs.jcim.5c00894. Epub 2025 Jun 17.
3
Short-Term Memory Impairment短期记忆障碍
4
Carbon dioxide detection for diagnosis of inadvertent respiratory tract placement of enterogastric tubes in children.用于诊断儿童肠胃管意外置入呼吸道的二氧化碳检测
Cochrane Database Syst Rev. 2025 Feb 19;2(2):CD011196. doi: 10.1002/14651858.CD011196.pub2.
5
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.
6
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理:一项网络荟萃分析。
Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.
7
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
9
Sertindole for schizophrenia.用于治疗精神分裂症的舍吲哚。
Cochrane Database Syst Rev. 2005 Jul 20;2005(3):CD001715. doi: 10.1002/14651858.CD001715.pub2.
10
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.