• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于能量的聚类:具有已知似然函数的数据的快速和鲁棒聚类。

Energy-based clustering: Fast and robust clustering of data with known likelihood functions.

机构信息

Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland.

出版信息

J Chem Phys. 2023 Jul 14;159(2). doi: 10.1063/5.0148735.

DOI:10.1063/5.0148735
PMID:37428043
Abstract

Clustering has become an indispensable tool in the presence of increasingly large and complex datasets. Most clustering algorithms depend, either explicitly or implicitly, on the sampled density. However, estimated densities are fragile due to the curse of dimensionality and finite sampling effects, for instance, in molecular dynamics simulations. To avoid the dependence on estimated densities, an energy-based clustering (EBC) algorithm based on the Metropolis acceptance criterion is developed in this work. In the proposed formulation, EBC can be considered a generalization of spectral clustering in the limit of large temperatures. Taking the potential energy of a sample explicitly into account alleviates requirements regarding the distribution of the data. In addition, it permits the subsampling of densely sampled regions, which can result in significant speed-ups and sublinear scaling. The algorithm is validated on a range of test systems including molecular dynamics trajectories of alanine dipeptide and the Trp-cage miniprotein. Our results show that including information about the potential-energy surface can largely decouple clustering from the sampled density.

摘要

聚类已经成为处理日益庞大和复杂数据集的不可或缺的工具。大多数聚类算法要么显式地,要么隐式地依赖于采样密度。然而,由于维度诅咒和有限的采样效应,例如在分子动力学模拟中,估计的密度是脆弱的。为了避免对估计密度的依赖,本文开发了一种基于 Metropolis 接受准则的基于能量的聚类(EBC)算法。在提出的公式中,EBC 可以被认为是在大温度极限下谱聚类的推广。明确考虑样本的势能可以减轻对数据分布的要求。此外,它允许对密集采样区域进行子采样,从而可以实现显著的加速和次线性缩放。该算法在一系列测试系统上进行了验证,包括丙氨酸二肽和 Trp-cage 小蛋白的分子动力学轨迹。我们的结果表明,包含关于势能面的信息可以将聚类与采样密度很大程度上解耦。

相似文献

1
Energy-based clustering: Fast and robust clustering of data with known likelihood functions.基于能量的聚类:具有已知似然函数的数据的快速和鲁棒聚类。
J Chem Phys. 2023 Jul 14;159(2). doi: 10.1063/5.0148735.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Fast conformational clustering of extensive molecular dynamics simulation data.快速构象聚类的广泛分子动力学模拟数据。
J Chem Phys. 2023 Apr 14;158(14):144109. doi: 10.1063/5.0142797.
4
Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms.聚类分子动力学轨迹:1. 表征不同聚类算法的性能
J Chem Theory Comput. 2007 Nov;3(6):2312-34. doi: 10.1021/ct700119m.
5
Volume-scaled common nearest neighbor clustering algorithm with free-energy hierarchy.基于自由能层次的体积标度公共最近邻聚类算法。
J Chem Phys. 2021 Feb 28;154(8):084106. doi: 10.1063/5.0025797.
6
Adaptive partitioning by local density-peaks: An efficient density-based clustering algorithm for analyzing molecular dynamics trajectories.基于局部密度峰值的自适应划分:一种用于分析分子动力学轨迹的高效基于密度的聚类算法。
J Comput Chem. 2017 Jan 30;38(3):152-160. doi: 10.1002/jcc.24664. Epub 2016 Nov 21.
7
Clustering molecular dynamics trajectories for optimizing docking experiments.聚类分子动力学轨迹以优化对接实验。
Comput Intell Neurosci. 2015;2015:916240. doi: 10.1155/2015/916240. Epub 2015 Mar 22.
8
Evolutionary Multiobjective Clustering and Its Applications to Patient Stratification.进化多目标聚类及其在患者分层中的应用。
IEEE Trans Cybern. 2019 May;49(5):1680-1693. doi: 10.1109/TCYB.2018.2817480. Epub 2018 Apr 2.
9
LRT-CLUSTER: A New Clustering Algorithm Based on Likelihood Ratio Test to Identify Driving Genes.LRT-CLUSTER:一种基于似然比检验的新型聚类算法以识别驱动基因。
Interdiscip Sci. 2023 Jun;15(2):217-230. doi: 10.1007/s12539-023-00554-2. Epub 2023 Feb 27.
10
Validating clustering of molecular dynamics simulations using polymer models.使用聚合物模型验证分子动力学模拟的聚类。
BMC Bioinformatics. 2011 Nov 14;12:445. doi: 10.1186/1471-2105-12-445.

引用本文的文献

1
Understanding and Quantifying Molecular Flexibility: Torsion Angular Bin Strings.理解和量化分子柔性:扭转角 bin 字符串。
J Chem Inf Model. 2024 Oct 28;64(20):7917-7924. doi: 10.1021/acs.jcim.4c01513. Epub 2024 Oct 10.
2
A general graph neural network based implicit solvation model for organic molecules in water.一种基于通用图神经网络的水中有机分子隐式溶剂化模型。
Chem Sci. 2024 Jun 19;15(28):10794-10802. doi: 10.1039/d4sc02432j. eCollection 2024 Jul 17.
3
Analytical Framework to Understand the Origins of Methyl Side-Chain Dynamics in Protein Assemblies.
分析框架,用于理解蛋白质组装体中甲硫侧链动力学的起源。
J Am Chem Soc. 2024 Mar 27;146(12):8164-8178. doi: 10.1021/jacs.3c12620. Epub 2024 Mar 13.