Suppr超能文献

一种符合 FAIR 原则的分子模拟轨迹管理解决方案。

A FAIR-Compliant Management Solution for Molecular Simulation Trajectories.

作者信息

Vitalis Andreas, Winkler Steffen, Zhang Yang, Widmer Julian, Caflisch Amedeo

机构信息

Department of Biochemistry, University of Zurich, Winterthurerstr. 190, 8057 Zurich, Switzerland.

出版信息

J Chem Inf Model. 2025 Mar 10;65(5):2443-2455. doi: 10.1021/acs.jcim.4c01301. Epub 2025 Feb 20.

Abstract

Simulation studies of molecules primarily produce data that represent the configuration of the system as a function of the progress variable, usually time. Because of the high-dimensional nature of these data, which grow very quickly, compromises are often necessary and achieved by storing only a subset of the system's components, for example, stripping solvent, and by restricting the time resolution to a scale significantly coarser than the basic time step of the simulation. The resultant trajectories thus describe the essentially stochastic evolution of the molecules of interest. Maintaining their interpretability through metadata is of interest not only because they can aid researchers interested in specific systems but also for reproducibility studies and model refinement. Here, we introduce a standard for the storage of data created by molecular simulations that improves compliance with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. We describe a solution conceived in PostgreSQL, along with reference implementations, that provides stringent links between metadata and raw data, which is a major weakness of the established file formats used for storing these data. A possible structure for the logic of SQL queries is included along with salient performance testing. To close, we suggest that a PostgreSQL-based storage of simulation data, in particular when coupled to a visual user interface, can improve the FAIR compliance of molecular simulation data at all levels of visibility, and a prototype solution for accomplishing this is presented.

摘要

分子模拟研究主要产生的数据表示系统构型随进程变量(通常是时间)的变化情况。由于这些数据具有高维性且增长迅速,往往需要做出妥协,通过仅存储系统组件的一个子集(例如去除溶剂)以及将时间分辨率限制在比模拟的基本时间步长粗得多的尺度上来实现。因此,所得轨迹描述了感兴趣分子的本质随机演化。通过元数据保持其可解释性不仅因为这有助于对特定系统感兴趣的研究人员,还因为可用于再现性研究和模型优化。在此,我们引入一种用于存储分子模拟产生的数据的标准,该标准提高了对FAIR(可查找、可访问、可互操作和可重用)原则的遵循程度。我们描述了一种在PostgreSQL中构思的解决方案以及参考实现,该方案在元数据和原始数据之间提供了严格的链接,而这是用于存储这些数据的现有文件格式的一个主要弱点。文中还包含SQL查询逻辑的一种可能结构以及显著的性能测试。最后,我们建议基于PostgreSQL的模拟数据存储,特别是与可视化用户界面结合时,可以在各个可见性级别上提高分子模拟数据的FAIR合规性,并展示了实现此目的的一个原型解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f99/11898051/70137891ad8a/ci4c01301_0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验