Suppr超能文献

将图神经网络扩展至大型蛋白质

Scaling Graph Neural Networks to Large Proteins.

作者信息

Airas Justin, Zhang Bin

机构信息

Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, United States.

出版信息

J Chem Theory Comput. 2025 Feb 25;21(4):2055-2066. doi: 10.1021/acs.jctc.4c01420. Epub 2025 Feb 6.

Abstract

Graph neural network (GNN) architectures have emerged as promising force field models, exhibiting high accuracy in predicting complex energies and forces based on atomic identities and Cartesian coordinates. To expand the applicability of GNNs, and machine learning force fields more broadly, optimizing their computational efficiency is critical, especially for large biomolecular systems in classical molecular dynamics simulations. In this study, we address key challenges in existing GNN benchmarks by introducing a dataset, DISPEF, which comprises large, biologically relevant proteins. DISPEF includes 207,454 proteins with sizes up to 12,499 atoms and features diverse chemical environments, spanning folded and disordered regions. The implicit solvation free energies, used as training targets, represent a particularly challenging case due to their many-body nature, providing a stringent test for evaluating the expressiveness of machine learning models. We benchmark the performance of seven GNNs on DISPEF, emphasizing the importance of directly accounting for long-range interactions to enhance model transferability. Additionally, we present a novel multiscale architecture, termed Schake, which delivers transferable and computationally efficient energy and force predictions for large proteins. Our findings offer valuable insights and tools for advancing GNNs in protein modeling applications.

摘要

图神经网络(GNN)架构已成为很有前景的力场模型,在基于原子身份和笛卡尔坐标预测复杂能量和力方面表现出高精度。为了更广泛地扩展GNN以及机器学习力场的适用性,优化它们的计算效率至关重要,特别是对于经典分子动力学模拟中的大型生物分子系统。在本研究中,我们通过引入一个数据集DISPEF来应对现有GNN基准测试中的关键挑战,该数据集包含大型的、与生物学相关的蛋白质。DISPEF包括207,454种蛋白质,大小可达12,499个原子,具有多样的化学环境,涵盖折叠和无序区域。用作训练目标的隐式溶剂化自由能因其多体性质而代表了一个特别具有挑战性的情况,为评估机器学习模型的表现力提供了严格的测试。我们在DISPEF上对七种GNN的性能进行基准测试,强调直接考虑长程相互作用以增强模型可转移性的重要性。此外,我们提出了一种新颖的多尺度架构,称为Schake,它能为大型蛋白质提供可转移且计算高效的能量和力预测。我们的研究结果为在蛋白质建模应用中推进GNN提供了有价值的见解和工具。

相似文献

1
Scaling Graph Neural Networks to Large Proteins.将图神经网络扩展至大型蛋白质
J Chem Theory Comput. 2025 Feb 25;21(4):2055-2066. doi: 10.1021/acs.jctc.4c01420. Epub 2025 Feb 6.
3

本文引用的文献

1
The design space of E(3)-equivariant atom-centred interatomic potentials.E(3) 等变原子中心原子间势的设计空间
Nat Mach Intell. 2025;7(1):56-67. doi: 10.1038/s42256-024-00956-x. Epub 2025 Jan 15.
3
Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning.肉豆蔻和香料:用于生物分子机器学习的模型和数据。
J Chem Theory Comput. 2024 Oct 8;20(19):8583-8593. doi: 10.1021/acs.jctc.4c00794. Epub 2024 Sep 25.
8
TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations.TorchMD-Net 2.0:用于分子模拟的快速神经网络势
J Chem Theory Comput. 2024 May 28;20(10):4076-4087. doi: 10.1021/acs.jctc.4c00253. Epub 2024 May 14.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验