Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States.
J Phys Chem B. 2024 Mar 14;128(10):2219-2227. doi: 10.1021/acs.jpcb.3c04823. Epub 2024 Feb 28.
Molecular dynamics (MD) simulations have become a valuable tool in structural biology, offering insights into complex biological systems that are difficult to obtain through experimental techniques alone. The lack of available data sets and structures in most published computational work has limited other researchers' use of these models. In recent years, the emergence of online sharing platforms and MD database initiatives favor the deposition of ensembles and structures to accompany publications, favoring reuse of the data sets. However, the lack of uniform metadata collection, formats, and what data are deposited limits the impact and its use by different communities that are not necessarily experts in MD. This Perspective highlights the need for standardization and better resource sharing for processing and interpreting MD simulation results, akin to efforts in other areas of structural biology. As the field moves forward, we will see an increase in popularity and benefits of MD-based integrative approaches combining experimental data and simulations through probabilistic reasoning, but these too are limited by uniformity in experimental data availability and choices on how the data are modeled that are not trivial to decipher from papers. Other fields have addressed similar challenges comprehensively by establishing task forces with different degrees of success. The large scope and number of communities to represent the breadth of types of MD simulations complicates a parallel approach that would fit all. Thus, each group typically decides what data and which format to upload on servers like Zenodo. Uploading data with FAIR (findable, accessible, interoperable, reusable) principles in mind including optimal metadata collection will make the data more accessible and actionable by the community. Such a wealth of simulation data will foster method development and infrastructure advancements, thus propelling the field forward.
分子动力学(MD)模拟已成为结构生物学中一种有价值的工具,为复杂的生物系统提供了深入的了解,这些系统仅通过实验技术很难获得。在大多数已发表的计算工作中,可用数据集和结构的缺乏限制了其他研究人员对这些模型的使用。近年来,在线共享平台和 MD 数据库计划的出现有利于伴随出版物一起存储集合和结构,有利于数据集的重复使用。然而,缺乏统一的元数据收集、格式以及要存储的数据,限制了不同社区的影响力及其使用,而这些社区不一定是 MD 方面的专家。本观点强调了需要对 MD 模拟结果的处理和解释进行标准化和更好的资源共享,就像在结构生物学的其他领域所做的那样。随着该领域的发展,我们将看到通过概率推理将实验数据和模拟相结合的基于 MD 的综合方法的普及和好处将会增加,但这些方法也受到实验数据可用性的一致性以及对数据建模方式的选择的限制,而这些限制从论文中很难解读。其他领域已经通过成立具有不同程度成功的工作组来全面解决类似的挑战。要代表 MD 模拟的广泛类型,涉及的范围和社区数量庞大,这使得并行方法变得复杂,而这种方法并不适合所有情况。因此,每个小组通常决定在像 Zenodo 这样的服务器上上传哪些数据和哪种格式。上传数据时考虑到 FAIR(可发现、可访问、可互操作、可重用)原则并包括最佳元数据收集,将使数据更便于社区访问和使用。如此丰富的模拟数据将促进方法开发和基础设施的进步,从而推动该领域向前发展。