AMUSET-TICA：一种基于张量的方法，用于识别生物分子动力学中的慢集体变量。

AMUSET-TICA: A Tensor-Based Approach for Identifying Slow Collective Variables in Biomolecular Dynamics.

作者信息

Cao Siqin, Nüske Feliks, Liu Bojun, Soley Micheline B, Huang Xuhui

机构信息

Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.

Max-Planck-Institute for Dynamics of Complex Technical Systems, Magdeburg 39106, Germany.

出版信息

J Chem Theory Comput. 2025 May 13;21(9):4855-4866. doi: 10.1021/acs.jctc.5c00076. Epub 2025 Apr 20.

DOI:10.1021/acs.jctc.5c00076

PMID:40254940

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12182257/

Abstract

Elucidating collective variables (CVs) for biomolecular dynamics is crucial for understanding numerous biological processes. By leveraging the tensor-train data structure, a multilinear version of the AMUSE (Algorithm for Multiple Unknown Signals) algorithm for Koopman approximation (AMUSEt) was recently developed to identify CVs for biomolecular dynamics. To find slow CVs, AMUSEt transforms input features (e.g., pairwise atomic distances) into nonlinear basis functions (e.g., Gaussian functions) and encodes these nonlinear basis functions within a tensor-train structure via time-lagged correlation functions. Due to the need to fit these tensor-train data structures into computer memory, AMUSEt can handle only a limited number of input features. Consequently, AMUSEt relies on manually selecting and ranking features based on physical intuition to fully capture the slow dynamics. However, when applied to complex biological systems with numerous features, this selection and ranking process becomes increasingly challenging. To address this challenge, here we present AMUSET-TICA (AMUSEt-based Time-lagged Independent Component Analysis), a CV-identification method using time-structure-independent components (tICs) as the input features for AMUSEt. The key insight of AMUSET-TICA lies in its highly effective embedding of high-dimensional atomistic protein conformations, achieved by expanding orthogonal tICs into overlapping Gaussian basis functions through a tensor-product data structure. This eliminates the need for manually selecting and ranking input features for a wide range of biomolecular systems. We demonstrate that AMUSET-TICA consistently and significantly outperforms AMUSEt and tICA in identifying slow CVs for three different biomolecular systems: alanine dipeptide, the N-terminal domain of L9 (NTL9), and the FIP35 WW domain. For all these systems, the CVs generated by AMUSET-TICA accurately describe the slowest dynamical modes underlying these biological conformational changes. Furthermore, we show that AMUSET-TICA achieves performance comparable to deep-learning approaches like VAMPnets in identifying the slowest dynamical modes, while being significantly more computationally efficient in terms of CPU time. In addition, the CVs yielded by AMUSET-TICA provide insights into the folding mechanisms of NTL9 and the FIP35 WW domain, including CV3 and CV4 of the WW domain, which capture its two parallel folding pathways. We expect AMUSET-TICA can be widely applied to facilitate the investigation of biomolecular dynamics.

摘要

阐明生物分子动力学的集体变量（CVs）对于理解众多生物过程至关重要。通过利用张量列车数据结构，最近开发了一种用于柯普曼近似的多线性版本的AMUSE（多未知信号算法）算法（AMUSEt），以识别生物分子动力学的CVs。为了找到缓慢的CVs，AMUSEt将输入特征（例如成对原子距离）转换为非线性基函数（例如高斯函数），并通过时间滞后相关函数在张量列车结构内对这些非线性基函数进行编码。由于需要将这些张量列车数据结构拟合到计算机内存中，AMUSEt只能处理有限数量的输入特征。因此，AMUSEt依赖于基于物理直觉手动选择和排列特征，以充分捕捉缓慢的动力学。然而，当应用于具有众多特征的复杂生物系统时，这种选择和排列过程变得越来越具有挑战性。为了应对这一挑战，我们在此提出AMUSET-TICA（基于AMUSEt的时间滞后独立成分分析），这是一种CV识别方法，使用时间结构独立成分（tICs）作为AMUSEt的输入特征。AMUSET-TICA的关键见解在于其对高维原子蛋白质构象的高效嵌入，这是通过张量积数据结构将正交tICs扩展为重叠高斯基函数来实现的。这消除了为广泛的生物分子系统手动选择和排列输入特征的需要。我们证明，在为三种不同的生物分子系统（丙氨酸二肽、L9的N端结构域（NTL9）和FIP35 WW结构域）识别缓慢CVs方面，AMUSET-TICA始终显著优于AMUSEt和tICA。对于所有这些系统，AMUSET-TICA生成的CVs准确地描述了这些生物构象变化背后最慢的动力学模式。此外，我们表明，AMUSET-TICA在识别最慢的动力学模式方面实现了与VAMPnets等深度学习方法相当的性能，同时在CPU时间方面计算效率显著更高。此外，AMUSET-TICA产生的CVs为NTL9和FIP35 WW结构域的折叠机制提供了见解，包括WW结构域的CV3和CV4，它们捕捉了其两条平行的折叠途径。我们期望AMUSET-TICA能够广泛应用于促进生物分子动力学的研究。

相似文献

AMUSET-TICA: A Tensor-Based Approach for Identifying Slow Collective Variables in Biomolecular Dynamics.

J Chem Theory Comput. 2025 May 13;21(9):4855-4866. doi: 10.1021/acs.jctc.5c00076. Epub 2025 Apr 20.

Stigma Management Strategies of Autistic Social Media Users.

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

Interventions for central serous chorioretinopathy: a network meta-analysis.

Cochrane Database Syst Rev. 2025 Jun 16;6(6):CD011841. doi: 10.1002/14651858.CD011841.pub3.

Molecular feature-based classification of retroperitoneal liposarcoma: a prospective cohort study.

Elife. 2025 May 23;14:RP100887. doi: 10.7554/eLife.100887.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

Exploring engagement patterns within a mobile health intervention for women at risk of gestational diabetes.

Womens Health (Lond). 2025 Jan-Dec;21:17455057251327510. doi: 10.1177/17455057251327510. Epub 2025 Jun 5.

Wood Waste Valorization and Classification Approaches: A systematic review.

Open Res Eur. 2025 May 6;5:5. doi: 10.12688/openreseurope.18862.1. eCollection 2025.

Feasibility study of Learning Together for Mental Health: fidelity, reach and acceptability of a whole-school intervention aiming to promote health and wellbeing in secondary schools.

Public Health Res (Southampt). 2025 Jun 18:1-36. doi: 10.3310/RTRT0202.

Aural toilet (ear cleaning) for chronic suppurative otitis media.

Cochrane Database Syst Rev. 2025 Jun 9;6(6):CD013057. doi: 10.1002/14651858.CD013057.pub3.

Prediction, screening and characterization of novel bioactive tetrapeptide matrikines for skin rejuvenation.

Br J Dermatol. 2024 Jun 20;191(1):92-106. doi: 10.1093/bjd/ljae061.

本文引用的文献

Memory kernel minimization-based neural networks for discovering slow collective variables of biomolecular dynamics.

Nat Comput Sci. 2025 Jul;5(7):562-571. doi: 10.1038/s43588-025-00815-8. Epub 2025 Jun 10.

Exploring transition states of protein conformational changes via out-of-distribution detection in the hyperspherical latent space.

Nat Commun. 2025 Jan 2;16(1):349. doi: 10.1038/s41467-024-55228-4.

Tutorial on how to build non-Markovian dynamic models from molecular dynamics simulations for studying protein conformational changes.

J Chem Phys. 2024 Mar 28;160(12). doi: 10.1063/5.0189429.

Markov State Models: To Optimize or Not to Optimize.

J Chem Theory Comput. 2024 Jan 23;20(2):977-988. doi: 10.1021/acs.jctc.3c01134. Epub 2024 Jan 1.

Integrative generalized master equation: A method to study long-timescale biomolecular dynamics via the integrals of memory kernels.

J Chem Phys. 2023 Oct 7;159(13). doi: 10.1063/5.0167287.

GraphVAMPnets for uncovering slow collective variables of self-assembly dynamics.

J Chem Phys. 2023 Sep 7;159(9). doi: 10.1063/5.0158903.

Efficient approximation of molecular kinetics using random Fourier features.

J Chem Phys. 2023 Aug 21;159(7). doi: 10.1063/5.0162619.

An Efficient Path Classification Algorithm Based on Variational Autoencoder to Identify Metastable Path Channels for Complex Conformational Changes.

J Chem Theory Comput. 2023 Jul 25;19(14):4728-4742. doi: 10.1021/acs.jctc.3c00318. Epub 2023 Jun 29.

Memory Unlocks the Future of Biomolecular Dynamics: Transformative Tools to Uncover Physical Insights Accurately and Efficiently.

J Am Chem Soc. 2023 May 10;145(18):9916-9927. doi: 10.1021/jacs.3c01095. Epub 2023 Apr 27.

Active Learning of the Conformational Ensemble of Proteins Using Maximum Entropy VAMPNets.

J Chem Theory Comput. 2023 Jul 25;19(14):4377-4388. doi: 10.1021/acs.jctc.3c00040. Epub 2023 Apr 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

AMUSET-TICA：一种基于张量的方法，用于识别生物分子动力学中的慢集体变量。

AMUSET-TICA: A Tensor-Based Approach for Identifying Slow Collective Variables in Biomolecular Dynamics.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献