Suppr超能文献

基于能量的聚类:具有已知似然函数的数据的快速和鲁棒聚类。

Energy-based clustering: Fast and robust clustering of data with known likelihood functions.

机构信息

Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland.

出版信息

J Chem Phys. 2023 Jul 14;159(2). doi: 10.1063/5.0148735.

Abstract

Clustering has become an indispensable tool in the presence of increasingly large and complex datasets. Most clustering algorithms depend, either explicitly or implicitly, on the sampled density. However, estimated densities are fragile due to the curse of dimensionality and finite sampling effects, for instance, in molecular dynamics simulations. To avoid the dependence on estimated densities, an energy-based clustering (EBC) algorithm based on the Metropolis acceptance criterion is developed in this work. In the proposed formulation, EBC can be considered a generalization of spectral clustering in the limit of large temperatures. Taking the potential energy of a sample explicitly into account alleviates requirements regarding the distribution of the data. In addition, it permits the subsampling of densely sampled regions, which can result in significant speed-ups and sublinear scaling. The algorithm is validated on a range of test systems including molecular dynamics trajectories of alanine dipeptide and the Trp-cage miniprotein. Our results show that including information about the potential-energy surface can largely decouple clustering from the sampled density.

摘要

聚类已经成为处理日益庞大和复杂数据集的不可或缺的工具。大多数聚类算法要么显式地,要么隐式地依赖于采样密度。然而,由于维度诅咒和有限的采样效应,例如在分子动力学模拟中,估计的密度是脆弱的。为了避免对估计密度的依赖,本文开发了一种基于 Metropolis 接受准则的基于能量的聚类(EBC)算法。在提出的公式中,EBC 可以被认为是在大温度极限下谱聚类的推广。明确考虑样本的势能可以减轻对数据分布的要求。此外,它允许对密集采样区域进行子采样,从而可以实现显著的加速和次线性缩放。该算法在一系列测试系统上进行了验证,包括丙氨酸二肽和 Trp-cage 小蛋白的分子动力学轨迹。我们的结果表明,包含关于势能面的信息可以将聚类与采样密度很大程度上解耦。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验