Suppr超能文献

CG 微笑式:一种适用于多分辨率分子表示的通用线性表示法。

CGsmiles: A Versatile Line Notation for Molecular Representations across Multiple Resolutions.

作者信息

Grünewald Fabian, Seute Leif, Alessandri Riccardo, König Melanie, Kroon Peter C

机构信息

Heidelberg Institute for Theoretical Studies (HITS), Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany.

Interdisciplinary Center for Scientific Computing, Heidelberg University, 69120 Heidelberg, Germany.

出版信息

J Chem Inf Model. 2025 Apr 14;65(7):3405-3419. doi: 10.1021/acs.jcim.5c00064. Epub 2025 Mar 24.

Abstract

Coarse-grained (CG) models simplify molecular representations by grouping multiple atoms into effective particles, enabling faster simulations and reducing the chemical compound space compared to atomistic methods. Additionally, models with chemical specificity, such as Martini, may extrapolate to cases where experimental data is scarce, making CG methods highly promising for high-throughput (HT) screenings and chemical space exploration. Yet no rigorous data formats exist for the crucial aspect of describing how the atoms are grouped (i.e., the mapping). As CG models advance toward true HT capabilities, the lack of mappings and indexing capabilities for the growing number of CG molecules poses a significant barrier. To address this, we introduce CGsmiles, a versatile line notation inspired by the popular Simplified Molecular Input Line Entry System (SMILES) and BigSMILES. CGsmiles encodes the molecular graph and particle (atom) properties independent of their resolution and incorporates a framework that allows seamless conversion between coarse- and fine-grained resolutions. By specifying fragments that describe how each particle is represented at the next finer resolution (e.g., CG particles to atoms), CGsmiles can represent multiple resolutions and their hierarchical relationships in a single string. In this paper, we present the CGSmiles syntax and analyze a benchmark set of 407 molecules from the Martini force field. We highlight key features missing in existing notations that are essential for accurately describing CG models. To demonstrate the utility of CGsmiles beyond simulations, we construct two simple machine-learning models for predicting partition coefficients, both trained on CGsmiles-indexed data and leveraging information from both CG and atomistic resolutions. Finally, we briefly discuss the applicability of CGsmiles to polymers, which particularly benefit from the multiresolution nature of the notation.

摘要

粗粒度(CG)模型通过将多个原子分组为有效的粒子来简化分子表示,与原子方法相比,能够实现更快的模拟并减少化合物空间。此外,具有化学特异性的模型,如Martini模型,可外推到实验数据稀缺的情况,这使得CG方法在高通量(HT)筛选和化学空间探索方面极具前景。然而,对于描述原子如何分组(即映射)这一关键方面,尚无严格的数据格式。随着CG模型向真正的HT能力发展,越来越多的CG分子缺乏映射和索引能力构成了重大障碍。为解决这一问题,我们引入了CGsmiles,这是一种受流行的简化分子输入线性条目系统(SMILES)和BigSMILES启发的通用线性表示法。CGsmiles对分子图和粒子(原子)属性进行编码,而与它们的分辨率无关,并包含一个允许在粗粒度和细粒度分辨率之间无缝转换的框架。通过指定描述每个粒子在下一个更精细分辨率下如何表示的片段(例如,CG粒子到原子),CGsmiles可以在单个字符串中表示多种分辨率及其层次关系。在本文中,我们介绍了CGSmiles语法,并分析了来自Martini力场的407个分子的基准集。我们强调了现有表示法中缺少的对于准确描述CG模型至关重要的关键特征。为了证明CGsmiles在模拟之外的实用性,我们构建了两个简单的机器学习模型来预测分配系数,这两个模型都基于CGsmiles索引的数据进行训练,并利用了CG和原子分辨率的信息。最后,我们简要讨论了CGsmiles对聚合物的适用性,聚合物尤其受益于该表示法的多分辨率性质。

相似文献

9
A Practical View of the Martini Force Field.对Martini力场的实际观点。
Methods Mol Biol. 2019;2022:105-127. doi: 10.1007/978-1-4939-9608-7_5.
10
Facilitating CG Simulations with MAD: The MArtini Database Server.利用 MAD 促进 CG 模拟:MArtini 数据库服务器。
J Chem Inf Model. 2023 Feb 13;63(3):702-710. doi: 10.1021/acs.jcim.2c01375. Epub 2023 Jan 19.

本文引用的文献

3
A Vision for the Future of Multiscale Modeling.多尺度建模的未来愿景。
ACS Phys Chem Au. 2024 Mar 4;4(3):202-225. doi: 10.1021/acsphyschemau.3c00080. eCollection 2024 May 22.
4
Pragmatic Coarse-Graining of Proteins: Models and Applications.蛋白质实用粗粒化:模型与应用。
J Chem Theory Comput. 2023 Oct 24;19(20):7112-7135. doi: 10.1021/acs.jctc.3c00733. Epub 2023 Oct 3.
7
Molecular dynamics simulation of an entire cell.整个细胞的分子动力学模拟。
Front Chem. 2023 Jan 18;11:1106495. doi: 10.3389/fchem.2023.1106495. eCollection 2023.
8
Facilitating CG Simulations with MAD: The MArtini Database Server.利用 MAD 促进 CG 模拟:MArtini 数据库服务器。
J Chem Inf Model. 2023 Feb 13;63(3):702-710. doi: 10.1021/acs.jcim.2c01375. Epub 2023 Jan 19.
10
Martini 3 Coarse-Grained Force Field for Carbohydrates.马丁尼 3 号碳水化合物粗粒度力场。
J Chem Theory Comput. 2022 Dec 13;18(12):7555-7569. doi: 10.1021/acs.jctc.2c00757. Epub 2022 Nov 7.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验