• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似性阈值对分子相似性网络拓扑结构和聚类结果的影响。

Impact of similarity threshold on the topology of molecular similarity networks and clustering outcomes.

作者信息

Zahoránszky-Kőhalmi Gergely, Bologa Cristian G, Oprea Tudor I

机构信息

Translational Informatics Division, University of New Mexico School of Medicine, MSC09 5025, Albuquerque, NM 87131 USA.

出版信息

J Cheminform. 2016 Mar 30;8:16. doi: 10.1186/s13321-016-0127-5. eCollection 2016.

DOI:10.1186/s13321-016-0127-5
PMID:27030802
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4812625/
Abstract

BACKGROUND

Complex network theory based methods and the emergence of "Big Data" have reshaped the terrain of investigating structure-activity relationships of molecules. This change gave rise to new methods which need to face an important challenge, namely: how to restructure a large molecular dataset into a network that best serves the purpose of the subsequent analyses. With special focus on network clustering, our study addresses this open question by proposing a data transformation method and a clustering framework.

RESULTS

Using the WOMBAT and PubChem MLSMR datasets we investigated the relation between varying the similarity threshold applied on the similarity matrix and the average clustering coefficient of the emerging similarity-based networks. These similarity networks were then clustered with the InfoMap algorithm. We devised a systematic method to generate so-called "pseudo-reference" clustering datasets which compensate for the lack of large-scale reference datasets. With help from the clustering framework we were able to observe the effects of varying the similarity threshold and its consequence on the average clustering coefficient and the clustering performance.

CONCLUSIONS

We observed that the average clustering coefficient versus similarity threshold function can be characterized by the presence of a peak that covers a range of similarity threshold values. This peak is preceded by a steep decline in the number of edges of the similarity network. The maximum of this peak is well aligned with the best clustering outcome. Thus, if no reference set is available, choosing the similarity threshold associated with this peak would be a near-ideal setting for the subsequent network cluster analysis. The proposed method can be used as a general approach to determine the appropriate similarity threshold to generate the similarity network of large-scale molecular datasets.

摘要

背景

基于复杂网络理论的方法以及“大数据”的出现重塑了分子结构-活性关系的研究领域。这种变化催生了新的方法,这些方法需要面对一个重要挑战,即:如何将一个大型分子数据集重构为一个最适合后续分析目的的网络。我们的研究特别关注网络聚类,通过提出一种数据转换方法和一个聚类框架来解决这个开放性问题。

结果

使用WOMBAT和PubChem MLSMR数据集,我们研究了在相似性矩阵上应用的相似性阈值变化与新兴的基于相似性的网络的平均聚类系数之间的关系。然后使用InfoMap算法对这些相似性网络进行聚类。我们设计了一种系统方法来生成所谓的“伪参考”聚类数据集,以弥补大规模参考数据集的不足。借助聚类框架,我们能够观察到相似性阈值变化的影响及其对平均聚类系数和聚类性能的后果。

结论

我们观察到平均聚类系数与相似性阈值函数的特征是存在一个覆盖一定相似性阈值范围的峰值。在这个峰值之前,相似性网络的边数会急剧下降。这个峰值的最大值与最佳聚类结果高度吻合。因此,如果没有可用的参考集,选择与这个峰值相关的相似性阈值将是后续网络聚类分析的近乎理想的设置。所提出的方法可以用作确定合适的相似性阈值以生成大规模分子数据集的相似性网络的通用方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/4a4f2f86afbc/13321_2016_127_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/5a88bbb4f58d/13321_2016_127_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/1fdd59013631/13321_2016_127_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/0e610af74aa3/13321_2016_127_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/e3a635507fb5/13321_2016_127_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/bd94fc90777d/13321_2016_127_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/4a4f2f86afbc/13321_2016_127_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/5a88bbb4f58d/13321_2016_127_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/1fdd59013631/13321_2016_127_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/0e610af74aa3/13321_2016_127_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/e3a635507fb5/13321_2016_127_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/bd94fc90777d/13321_2016_127_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5941/4812625/4a4f2f86afbc/13321_2016_127_Fig6_HTML.jpg

相似文献

1
Impact of similarity threshold on the topology of molecular similarity networks and clustering outcomes.相似性阈值对分子相似性网络拓扑结构和聚类结果的影响。
J Cheminform. 2016 Mar 30;8:16. doi: 10.1186/s13321-016-0127-5. eCollection 2016.
2
CASS: A distributed network clustering algorithm based on structure similarity for large-scale network.基于结构相似性的大规模网络分布式网络聚类算法
PLoS One. 2018 Oct 10;13(10):e0203670. doi: 10.1371/journal.pone.0203670. eCollection 2018.
3
A nearest-neighbors network model for sequence data reveals new insight into genotype distribution of a pathogen.一种用于序列数据的最近邻网络模型揭示了病原体基因型分布的新见解。
BMC Bioinformatics. 2018 Dec 12;19(1):475. doi: 10.1186/s12859-018-2453-2.
4
GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.GO 功能相似性聚类取决于相似性度量、聚类方法和注释完整性。
BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.
5
Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.使用评估全序列、全结构和活性位点微环境相似性的边度量对蛋白质网络内的拓扑聚类进行比较。
Protein Sci. 2015 Sep;24(9):1423-39. doi: 10.1002/pro.2724. Epub 2015 Aug 18.
6
Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.可视化与聚类蛋白质相似性网络:序列、结构与功能
J Proteome Res. 2016 Jul 1;15(7):2123-31. doi: 10.1021/acs.jproteome.5b01031. Epub 2016 Jun 15.
7
Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach.基于聚类的加权相似极限学习机的虚拟筛选新方法。
PLoS One. 2018 Apr 13;13(4):e0195478. doi: 10.1371/journal.pone.0195478. eCollection 2018.
8
A multi-similarity spectral clustering method for community detection in dynamic networks.一种用于动态网络中社区检测的多相似性谱聚类方法。
Sci Rep. 2016 Aug 16;6:31454. doi: 10.1038/srep31454.
9
Approximate spectral clustering using both reference vectors and topology of the network generated by growing neural gas.使用参考向量和由生长神经气体生成的网络拓扑结构进行近似谱聚类。
PeerJ Comput Sci. 2021 Aug 20;7:e679. doi: 10.7717/peerj-cs.679. eCollection 2021.
10
Clustering and visualizing similarity networks of membrane proteins.膜蛋白相似性网络的聚类与可视化
Proteins. 2015 Aug;83(8):1450-61. doi: 10.1002/prot.24832. Epub 2015 Jun 6.

引用本文的文献

1
In Silico Identification of Natural SIRT1 Inhibitors through Molecular Docking, Dynamics Simulation, and MM/PBSA.通过分子对接、动力学模拟和MM/PBSA对天然SIRT1抑制剂进行计算机模拟鉴定
Cell Biochem Biophys. 2025 Sep 8. doi: 10.1007/s12013-025-01886-0.
2
BitBIRCH: efficient clustering of large molecular libraries.BitBIRCH:大型分子文库的高效聚类
Digit Discov. 2025 Mar 13;4(4):1042-1051. doi: 10.1039/d5dd00030k. eCollection 2025 Apr 9.
3
In Silico Evaluation of Some Computer-Designed Fluoroquinolone-Glutamic Acid Hybrids as Potential Topoisomerase II Inhibitors with Anti-Cancer Effect.

本文引用的文献

1
SNAP: A General Purpose Network Analysis and Graph Mining Library.SNAP:一个通用的网络分析和图挖掘库。
ACM Trans Intell Syst Technol. 2016 Oct;8(1). doi: 10.1145/2898361. Epub 2016 Oct 3.
2
The ChEMBL bioactivity database: an update.《ChEMBL 生物活性数据库更新》
Nucleic Acids Res. 2014 Jan;42(Database issue):D1083-90. doi: 10.1093/nar/gkt1031. Epub 2013 Nov 7.
3
A network-based method to assess the statistical significance of mild co-regulation effects.基于网络的方法评估轻度共调节效应的统计显著性。
一些计算机设计的氟喹诺酮-谷氨酸杂化物作为具有抗癌作用的潜在拓扑异构酶II抑制剂的计算机模拟评估
Pharmaceuticals (Basel). 2024 Nov 26;17(12):1593. doi: 10.3390/ph17121593.
4
Peptide hemolytic activity analysis using visual data mining of similarity-based complex networks.使用基于相似性的复杂网络的可视化数据挖掘分析肽的溶血活性。
NPJ Syst Biol Appl. 2024 Oct 4;10(1):115. doi: 10.1038/s41540-024-00429-2.
5
Exploring Natural Compounds as Potential CDK4 Inhibitors for Therapeutic Intervention in Neurodegenerative Diseases through Computational Analysis.通过计算分析探索天然化合物作为潜在的CDK4抑制剂用于神经退行性疾病的治疗干预。
Mol Biotechnol. 2024 Aug 29. doi: 10.1007/s12033-024-01258-8.
6
Efficient clustering of large molecular libraries.大型分子文库的高效聚类
bioRxiv. 2024 Aug 10:2024.08.10.607459. doi: 10.1101/2024.08.10.607459.
7
Hilbert-curve assisted structure embedding method.希尔伯特曲线辅助结构嵌入方法。
J Cheminform. 2024 Jul 29;16(1):87. doi: 10.1186/s13321-024-00850-z.
8
Network Science and Group Fusion Similarity-Based Searching to Explore the Chemical Space of Antiparasitic Peptides.基于网络科学和群组融合相似性的搜索以探索抗寄生虫肽的化学空间
ACS Omega. 2022 Dec 6;7(50):46012-46036. doi: 10.1021/acsomega.2c03398. eCollection 2022 Dec 20.
9
A Novel Network Science and Similarity-Searching-Based Approach for Discovering Potential Tumor-Homing Peptides from Antimicrobials.一种基于网络科学和相似性搜索的新型方法,用于从抗菌肽中发现潜在的肿瘤归巢肽。
Antibiotics (Basel). 2022 Mar 17;11(3):401. doi: 10.3390/antibiotics11030401.
10
The Transporter-Mediated Cellular Uptake and Efflux of Pharmaceutical Drugs and Biotechnology Products: How and Why Phospholipid Bilayer Transport Is Negligible in Real Biomembranes.药物和生物技术产品的转运体介导的细胞摄取和外排:为什么磷脂双层转运在真实生物膜中可以忽略不计。
Molecules. 2021 Sep 16;26(18):5629. doi: 10.3390/molecules26185629.
PLoS One. 2013 Sep 9;8(9):e73413. doi: 10.1371/journal.pone.0073413. eCollection 2013.
4
MetaMapp: mapping and visualizing metabolomic data by integrating information from biochemical pathways and chemical and mass spectral similarity.MetaMapp:通过整合生化途径信息以及化学和质谱相似性信息,对代谢组学数据进行映射和可视化。
BMC Bioinformatics. 2012 May 16;13:99. doi: 10.1186/1471-2105-13-99.
5
Open Babel: An open chemical toolbox.Open Babel:一个开放的化学工具箱。
J Cheminform. 2011 Oct 7;3:33. doi: 10.1186/1758-2946-3-33.
6
Scaffold diversity of exemplified medicinal chemistry space.体现药用化学空间的支架多样性。
J Chem Inf Model. 2011 Sep 26;51(9):2174-85. doi: 10.1021/ci2001428. Epub 2011 Aug 31.
7
Discovery of chemical compound groups with common structures by a network analysis approach (affinity prediction method).通过网络分析方法(亲和预测方法)发现具有共同结构的化学化合物组。
J Chem Inf Model. 2011 Jan 24;51(1):61-8. doi: 10.1021/ci100262s. Epub 2010 Dec 9.
8
Extended-connectivity fingerprints.扩展连接指纹。
J Chem Inf Model. 2010 May 24;50(5):742-54. doi: 10.1021/ci100050t.
9
Small-world phenomena in chemical library networks: application to fragment-based drug discovery.化学文库网络中的小世界现象:在基于片段的药物发现中的应用。
J Chem Inf Model. 2009 Dec;49(12):2677-86. doi: 10.1021/ci900123v.
10
Breaking the hierarchy--a new cluster selection mechanism for hierarchical clustering methods.打破层次结构——一种用于层次聚类方法的新聚类选择机制。
Algorithms Mol Biol. 2009 Oct 19;4:12. doi: 10.1186/1748-7188-4-12.