Suppr超能文献

UMAP 作为生物大分子分子动力学模拟的降维工具:一项对比研究。

UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study.

机构信息

Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States.

Department of Statistical Science, Southern Methodist University, Dallas, Texas 75275, United States.

出版信息

J Phys Chem B. 2021 May 20;125(19):5022-5034. doi: 10.1021/acs.jpcb.1c02081. Epub 2021 May 11.

Abstract

Proteins are the molecular machines of life. The multitude of possible conformations that proteins can adopt determines their free-energy landscapes. However, the inherently high dimensionality of a protein free-energy landscape poses a challenge to deciphering how proteins perform their functions. For this reason, dimensionality reduction is an active field of research for molecular biologists. The uniform manifold approximation and projection (UMAP) is a dimensionality reduction method based on a fuzzy topological analysis of data. In the present study, the performance of UMAP is compared with that of other popular dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE), principal component analysis (PCA), and time-structure independent components analysis (tICA) in the context of analyzing molecular dynamics simulations of the circadian clock protein VIVID. A good dimensionality reduction method should accurately represent the data structure on the projected components. The comparison of the raw high-dimensional data with the projections obtained using different dimensionality reduction methods based on various metrics showed that UMAP has superior performance when compared with linear reduction methods (PCA and tICA) and has competitive performance and scalable computational cost.

摘要

蛋白质是生命的分子机器。蛋白质可以采用的多种可能构象决定了它们的自由能景观。然而,蛋白质自由能景观固有的高维性给揭示蛋白质如何发挥其功能带来了挑战。出于这个原因,降维是分子生物学家的一个活跃研究领域。一致流形逼近和投影 (UMAP) 是一种基于数据模糊拓扑分析的降维方法。在本研究中,将 UMAP 的性能与其他流行的降维方法(如 t 分布随机邻居嵌入 (t-SNE)、主成分分析 (PCA) 和时间结构独立成分分析 (tICA))进行了比较,用于分析生物钟蛋白 VIVID 的分子动力学模拟。一个好的降维方法应该在投影分量上准确地表示数据结构。使用不同的降维方法基于各种度量对原始高维数据与投影的比较表明,与线性降维方法(PCA 和 tICA)相比,UMAP 具有更好的性能,并且具有竞争力的性能和可扩展的计算成本。

相似文献

1
UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study.
J Phys Chem B. 2021 May 20;125(19):5022-5034. doi: 10.1021/acs.jpcb.1c02081. Epub 2021 May 11.
2
Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data.
Cell Rep. 2021 Jul 27;36(4):109442. doi: 10.1016/j.celrep.2021.109442.
6
Capturing discrete latent structures: choose LDs over PCs.
Biostatistics. 2022 Dec 12;24(1):1-16. doi: 10.1093/biostatistics/kxab030.
7
A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations.
Cell Rep Methods. 2023 Jan 13;3(1):100390. doi: 10.1016/j.crmeth.2022.100390. eCollection 2023 Jan 23.
8
t-Distributed Stochastic Neighbor Embedding Method with the Least Information Loss for Macromolecular Simulations.
J Chem Theory Comput. 2018 Nov 13;14(11):5499-5510. doi: 10.1021/acs.jctc.8b00652. Epub 2018 Oct 9.
9
ivis Dimensionality Reduction Framework for Biomacromolecular Simulations.
J Chem Inf Model. 2020 Oct 26;60(10):4569-4581. doi: 10.1021/acs.jcim.0c00485. Epub 2020 Sep 1.
10
From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization.
Mol Inform. 2025 Jan;44(1):e202400265. doi: 10.1002/minf.202400265. Epub 2024 Dec 5.

引用本文的文献

1
A hybrid framework of generative deep learning for antiviral peptide discovery.
Sci Rep. 2025 Jul 15;15(1):25554. doi: 10.1038/s41598-025-11328-9.
4
Robustness in biomolecular simulations: Addressing challenges in data generation, analysis, and curation.
Cell Rep Phys Sci. 2025 May 21;6(5). doi: 10.1016/j.xcrp.2025.102566. Epub 2025 Apr 30.
5
7
Spatial mapping of the brain metabolome lipidome and glycome.
Nat Commun. 2025 May 12;16(1):4373. doi: 10.1038/s41467-025-59487-7.
8
Extended Quality (eQual): Radial Threshold Clustering Based on -ary Similarity.
J Chem Inf Model. 2025 May 26;65(10):5062-5070. doi: 10.1021/acs.jcim.4c02341. Epub 2025 May 1.
10
Molecular similarity: Theory, applications, and perspectives.
Artif Intell Chem. 2024 Dec;2(2). doi: 10.1016/j.aichem.2024.100077. Epub 2024 Aug 31.

本文引用的文献

2
Deciphering the Allosteric Process of the Aureochrome 1a LOV Domain.
J Phys Chem B. 2020 Oct 15;124(41):8960-8972. doi: 10.1021/acs.jpcb.0c05842. Epub 2020 Oct 1.
3
ivis Dimensionality Reduction Framework for Biomacromolecular Simulations.
J Chem Inf Model. 2020 Oct 26;60(10):4569-4581. doi: 10.1021/acs.jcim.0c00485. Epub 2020 Sep 1.
4
Deciphering the protein motion of S1 subunit in SARS-CoV-2 spike glycoprotein through integrated computational methods.
J Biomol Struct Dyn. 2021 Oct;39(17):6705-6712. doi: 10.1080/07391102.2020.1802338. Epub 2020 Aug 4.
5
UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts.
PLoS Genet. 2019 Nov 1;15(11):e1008432. doi: 10.1371/journal.pgen.1008432. eCollection 2019 Nov.
6
A lineage-resolved molecular atlas of embryogenesis at single-cell resolution.
Science. 2019 Sep 20;365(6459). doi: 10.1126/science.aax1971. Epub 2019 Sep 5.
7
Machine Learning Classification Model for Functional Binding Modes of TEM-1 β-Lactamase.
Front Mol Biosci. 2019 Jul 9;6:47. doi: 10.3389/fmolb.2019.00047. eCollection 2019.
8
Using Dimensionality Reduction to Analyze Protein Trajectories.
Front Mol Biosci. 2019 Jun 19;6:46. doi: 10.3389/fmolb.2019.00046. eCollection 2019.
9
The single-cell transcriptional landscape of mammalian organogenesis.
Nature. 2019 Feb;566(7745):496-502. doi: 10.1038/s41586-019-0969-x. Epub 2019 Feb 20.
10
Allosteric mechanism of the circadian protein Vivid resolved through Markov state model and machine learning analysis.
PLoS Comput Biol. 2019 Feb 19;15(2):e1006801. doi: 10.1371/journal.pcbi.1006801. eCollection 2019 Feb.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验