Suppr超能文献

分层扩展链接方法(HELM)对混合聚类策略的深入探讨。

Hierarchical Extended Linkage Method (HELM)'s Deep Dive into Hybrid Clustering Strategies.

作者信息

Chen Lexin, Brylle Woody Santos Jherome, Gaza Jokent, Perez Alberto, Miranda-Quintana Ramón Alain

机构信息

Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville 32611, Florida, United States.

出版信息

J Chem Inf Model. 2025 Jun 23;65(12):6209-6220. doi: 10.1021/acs.jcim.5c00539. Epub 2025 Jun 2.

Abstract

Clustering remains a key tool in the analysis of molecular dynamics (MD) simulations, from the preparation of kinetic models to the study of mechanistic pathways and structural determination. It is no surprise then that multiple algorithms are currently used in the MD community, with -means and hierarchical approaches being arguably the two most popular approaches. The former is very attractive from a purely computational point of view, demanding minimal memory and time resources, but at the price of being able to partition the data in very restrictive ways. Hierarchical strategies, on the other hand, can generate arbitrary partitions, but with steep memory and time requirements due to their need to build a pairwise distance matrix for all the considered conformations/frames. Here we propose a new hybrid paradigm, the hierarchical extended linkage method (HELM), that retains the efficiency of -means while incorporating the flexibility of hierarchical methods. The key ingredient is the use of -ary difference functions as a way to stabilize the -means results and efficiently build the hierarchy of subsets. We showcase the applicability of this strategy over protein-DNA and protein folding studies, including the complete analysis of simulations with over 1.5 million frames. HELM is freely available in our MDANCE clustering package.

摘要

聚类仍然是分子动力学(MD)模拟分析中的关键工具,从动力学模型的构建到机理途径的研究以及结构确定。因此,毫不奇怪MD领域目前使用了多种算法,其中k均值和层次聚类方法可以说是最受欢迎的两种方法。从纯粹的计算角度来看,前者非常有吸引力,只需要极少的内存和时间资源,但代价是只能以非常受限的方式对数据进行划分。另一方面,层次聚类策略可以生成任意划分,但由于需要为所有考虑的构象/帧构建成对距离矩阵,因此对内存和时间的要求很高。在此,我们提出了一种新的混合范式,即层次扩展链接方法(HELM),它保留了k均值的效率,同时融入了层次聚类方法的灵活性。关键要素是使用q元差分函数来稳定k均值的结果并有效地构建子集层次结构。我们展示了该策略在蛋白质-DNA和蛋白质折叠研究中的适用性,包括对超过150万个帧的模拟进行完整分析。HELM可在我们的MDANCE聚类软件包中免费获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验