• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MDSCAN:基于均方根偏差的长分子动力学HDBSCAN聚类

MDSCAN: RMSD-based HDBSCAN clustering of long molecular dynamics.

作者信息

González-Alemán Roy, Platero-Rochart Daniel, Rodríguez-Serradet Alejandro, Hernández-Rodríguez Erix W, Caballero Julio, Leclerc Fabrice, Montero-Cabrera Luis

机构信息

Laboratorio de Química Computacional y Teórica (LQCT), Facultad de Química, Universidad de La Habana, La Habana 10400, Cuba.

Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris Saclay, Gif-sur-Yvette F-91198, France.

出版信息

Bioinformatics. 2022 Nov 30;38(23):5191-5198. doi: 10.1093/bioinformatics/btac666.

DOI:10.1093/bioinformatics/btac666
PMID:36205607
Abstract

MOTIVATION

The term clustering designates a comprehensive family of unsupervised learning methods allowing to group similar elements into sets called clusters. Geometrical clustering of molecular dynamics (MD) trajectories is a well-established analysis to gain insights into the conformational behavior of simulated systems. However, popular variants collapse when processing relatively long trajectories because of their quadratic memory or time complexity. From the arsenal of clustering algorithms, HDBSCAN stands out as a hierarchical density-based alternative that provides robust differentiation of intimately related elements from noise data. Although a very efficient implementation of this algorithm is available for programming-skilled users (HDBSCAN*), it cannot treat long trajectories under the de facto molecular similarity metric RMSD.

RESULTS

Here, we propose MDSCAN, an HDBSCAN-inspired software specifically conceived for non-programmers users to perform memory-efficient RMSD-based clustering of long MD trajectories. Methodological improvements over the original version include the encoding of trajectories as a particular class of vantage-point tree (decreasing time complexity), and a dual-heap approach to construct a quasi-minimum spanning tree (reducing memory complexity). MDSCAN was able to process a trajectory of 1 million frames using the RMSD metric in about 21 h with <8 GB of RAM, a task that would have taken a similar time but more than 32 TB of RAM with the accelerated HDBSCAN* implementation generally used.

AVAILABILITY AND IMPLEMENTATION

The source code and documentation of MDSCAN are free and publicly available on GitHub (https://github.com/LQCT/MDScan.git) and as a PyPI package (https://pypi.org/project/mdscan/).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

聚类一词指的是一类全面的无监督学习方法,可将相似元素分组为称为簇的集合。分子动力学(MD)轨迹的几何聚类是一种成熟的分析方法,用于深入了解模拟系统的构象行为。然而,由于其二次内存或时间复杂度,流行的变体在处理相对较长的轨迹时会崩溃。在聚类算法库中,HDBSCAN作为一种基于层次密度的替代方法脱颖而出,它能从噪声数据中稳健地区分密切相关的元素。虽然该算法有一个非常高效的实现版本可供有编程技能的用户使用(HDBSCAN*),但在实际的分子相似性度量RMSD下,它无法处理长轨迹。

结果

在此,我们提出了MDSCAN,这是一款受HDBSCAN启发的软件,专为非程序员用户设计,用于对长MD轨迹进行基于RMSD的内存高效聚类。相对于原始版本的方法改进包括将轨迹编码为一种特殊的antage-point树(降低时间复杂度),以及采用双堆方法构建准最小生成树(降低内存复杂度)。MDSCAN能够在大约21小时内使用RMSD度量处理100万帧的轨迹,所需内存小于8GB,而使用通常的加速HDBSCAN*实现执行相同任务则需要类似的时间,但所需内存超过32TB。

可用性和实现

MDSCAN的源代码和文档可在GitHub(https://github.com/LQCT/MDScan.git)上免费公开获取,也可作为PyPI包(https://pypi.org/project/mdscan/)获取。

补充信息

补充数据可在《生物信息学》在线版获取。

相似文献

1
MDSCAN: RMSD-based HDBSCAN clustering of long molecular dynamics.MDSCAN:基于均方根偏差的长分子动力学HDBSCAN聚类
Bioinformatics. 2022 Nov 30;38(23):5191-5198. doi: 10.1093/bioinformatics/btac666.
2
RCDPeaks: memory-efficient density peaks clustering of long molecular dynamics.RCDPeaks:长分子动力学的内存高效密度峰聚类。
Bioinformatics. 2022 Mar 28;38(7):1863-1869. doi: 10.1093/bioinformatics/btac021.
3
BitQT: a graph-based approach to the quality threshold clustering of molecular dynamics.BitQT:一种基于图形的分子动力学质量阈值聚类方法。
Bioinformatics. 2021 Dec 22;38(1):73-79. doi: 10.1093/bioinformatics/btab595.
4
RapidRMSD: rapid determination of RMSDs corresponding to motions of flexible molecules.RapidRMSD:对应柔性分子运动的 RMSD 的快速确定。
Bioinformatics. 2018 Aug 15;34(16):2757-2765. doi: 10.1093/bioinformatics/bty160.
5
MD DaVis: interactive data visualization of protein molecular dynamics.MD DaVis:蛋白质分子动力学的交互式数据可视化。
Bioinformatics. 2022 Jun 13;38(12):3299-3301. doi: 10.1093/bioinformatics/btac314.
6
BitClust: Fast Geometrical Clustering of Long Molecular Dynamics Simulations.BitClust:长分子动力学模拟的快速几何聚类。
J Chem Inf Model. 2020 Feb 24;60(2):444-448. doi: 10.1021/acs.jcim.9b00828. Epub 2019 Nov 6.
7
GSEApy: a comprehensive package for performing gene set enrichment analysis in Python.GSEApy:一个用于在 Python 中进行基因集富集分析的综合软件包。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac757.
8
Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs.基于近似 k-最近邻图的稳健高效单细胞 Hi-C 聚类。
Bioinformatics. 2021 Nov 18;37(22):4006-4013. doi: 10.1093/bioinformatics/btab394.
9
FATSLiM: a fast and robust software to analyze MD simulations of membranes.FATSLiM:一款用于分析膜的分子动力学模拟的快速且强大的软件。
Bioinformatics. 2017 Jan 1;33(1):133-134. doi: 10.1093/bioinformatics/btw563. Epub 2016 Aug 29.
10
GeoWaVe: geometric median clustering with weighted voting for ensemble clustering of cytometry data.GeoWaVe:带加权投票的几何中位数聚类,用于流式细胞术数据的集成聚类。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac751.

引用本文的文献

1
clusttraj: A Solvent-Informed Clustering Tool for Molecular Modeling.Clusttraj:一种用于分子建模的溶剂信息聚类工具。
J Chem Theory Comput. 2025 Jul 22;21(14):6759-6768. doi: 10.1021/acs.jctc.5c00634. Epub 2025 Jul 3.
2
Structural Plasticity and Functional Dynamics of Pigeon Cryptochrome 4 as Avian Magnetoreceptor.作为鸟类磁受体的鸽子隐花色素4的结构可塑性与功能动力学
J Mol Biol. 2025 May 27:169233. doi: 10.1016/j.jmb.2025.169233.
3
Clustering Molecules at a Large Scale: Integrating Spectral Geometry with Deep Learning.大规模分子聚类:将光谱几何与深度学习相结合
Molecules. 2024 Aug 17;29(16):3902. doi: 10.3390/molecules29163902.
4
Molecular interactions of the Omicron, Kappa, and Delta SARS-CoV-2 spike proteins with quantum dots of graphene oxide.Omicron、Kappa 和 Delta SARS-CoV-2 刺突蛋白与氧化石墨烯量子点的分子相互作用。
J Mol Model. 2024 Jun 11;30(7):203. doi: 10.1007/s00894-024-05996-z.