Suppr超能文献

k均值NANI:一种用于分子动力学模拟的改进聚类算法。

k-Means NANI: an improved clustering algorithm for Molecular Dynamics simulations.

作者信息

Chen Lexin, Roe Daniel R, Kochert Matthew, Simmerling Carlos, Miranda-Quintana Ramón Alain

机构信息

Department of Chemistry, University of Florida, FL, USA.

Quantum Theory Project, University of Florida, FL, USA.

出版信息

bioRxiv. 2024 Mar 8:2024.03.07.583975. doi: 10.1101/2024.03.07.583975.

Abstract

One of the key challenges of -means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as -means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex datasets such as those obtained from molecular simulation, -means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors of -means++ will lead to a lack of reproducibility. -means -Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficient -ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helping -means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse datasets and be used as a standalone tool or as part of our MDANCE clustering package.

摘要

K均值聚类的关键挑战之一是种子选择或初始质心估计,因为聚类结果在很大程度上取决于这一选择。诸如K均值++等方法通过使用经验概率分布估计质心来缓解这一限制。然而,对于从分子模拟中获得的高维和复杂数据集,K均值++无法以最优方式对数据进行划分。此外,所有类型的K均值++中的随机元素都会导致缺乏可重复性。提出了K均值-ARY自然初始化(NANI)作为一种替代方法,通过使用高效的ARY比较来识别数据中的高密度区域并选择一组不同的初始构象,从而应对这一挑战。由NANI生成的质心不仅代表数据且彼此不同,有助于K均值准确地对数据进行划分,而且具有确定性,在重复实验中提供一致的聚类数量。从肽和蛋白质折叠分子模拟来看,NANI能够创建紧凑且分离良好的聚类,并准确找到与文献一致的亚稳态。NANI可以对各种数据集进行聚类,既可以作为独立工具使用,也可以作为我们的MDANCE聚类包的一部分使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1707/10942464/51eab00686c5/nihpp-2024.03.07.583975v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验